Transforming parameters in a simple time-series model; debugging the Jacobian

So. This one is pretty simple. But the general idea could be useful to some of you. So here goes.

We were fitting a model with an autocorrelation parameter, rho, which was constrained to be between 0 and 1. The model looks like this:

eta_t ~ normal(rho*eta_{t-1}, sigma_res), for t = 2, 3, ... T

To make this work out, you need the following marginal distribution for the first element in the series:

eta_1 ~ normal(0, sigma_res/sqrt(1 - rho^2)).

In case you’re wondering, eta is a vector of latent parameters deep inside the model. In addition to these autocorrelated error terms, the model also has linear predictors, independent errors, and all sorts of other things. But here we’re focusing on the etas.

It’s easy enough to write the above model in Stan, either by computing the T x T covariance matrix and using the multivariate normal, or by simply building up the distribution one step at a time, as above, which is how we did it.

When fitting the model there was some slow mixing, and we decided it could work better to reparameterize in terms of the marginal variance rather than the residual variance.

Thus, we defined:

sigma = sigma_res/sqrt(1 - rho^2)

And then the above model became:

eta_1 ~ normal(0, sigma)
eta_t ~ normal(rho*eta_{t-1}, sigma*sqrt(1 - rho^2)), for t = 2, 3, ... T

But then there were also issues with rho, having to do with posterior mass piling up near rho = 1. This is a situation that experienced time-series modelers will be familiar with, the so-called unit root problem, that time series often seem to want to be nonstationary. This sort of data issue should be respected, and it typically is a sign that the model needs more structure, for example long-term trends. Here, though, we wanted first to deal with awkward nonlinearity in the posterior dependence of rho and sigma, and we decided to reparameterize as follows:

xi = 1/(1 - rho^2)

When rho ranges from 0 to 1, xi varies from 1 to infinity. So now we don’t have to worry about the density piling up at the boundary.

So, in Stan:

parameters { real xi; real sigma; vector[T] eta;
transformed parameters { real rho; vector[T] M; vector[T] S; rho = sqrt(1 - 1/xi); M[1] = 0; S[1] = sigma; for (t in 2:T){ M[t] = rho*eta[t-1] S[t] = sigma*sqrt(1 - square(rho)); }
model { eta ~ normal(M, S);

There are various ways the code could be cleaner, especially with future language improvements. For example, I’d like M and S to not have to be named variables; instead they could be attributes of eta in some way.

But that discussion is for another time.

For now, let’s continue with this model. For computational reasons, it is convenient for xi, not rho, to be the parameter in the Stan model. But suppose we want to put a prior on the scale of rho. For example, the model above has an implicit flat prior on xi, but that implies a big mass on high values, as xi is unconstrained on the high end.

To put a flat prior on rho, when the above model is parameterized in terms of xi, we need to add the log Jacobian to the log posterior density, or target function in Stan.

The Jacobian is the absolute value of the derivative of the transformation:

J = |d(xi)/d(rho)| = 2*rho/(1-rho^2)^2

But I always get confused which direction to include the Jacobian, also I’m always worried that I get my algebra wrong.

So I write a little test program, “transformation_test.stan”:

parameters { real xi;
transformed parameters { real rho; rho = sqrt(1 - 1/xi);
model { target += log(rho) - 2*log(1 - square(rho));

And then I run this R script:

options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE) fit <- stan("transformation_test.stan")

And here's what we get:

Inference for Stan model: transformation_test.
4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
xi 1.076634e+16 2.917436e+14 4.410165e+15 3.446772e+15 6.921461e+15 1.086366e+16 1.463243e+16 1.765822e+16 229 1.03
rho 1.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 749 1.00
lp__ 1.088500e+02 4.000000e-02 6.900000e-01 1.064800e+02 1.085600e+02 1.090100e+02 1.093100e+02 1.095000e+02 338 1.02

Damn! I guess I did the Jacobian in the wrong direction.

Let's try again:

parameters { real xi;
transformed parameters { real rho; rho = sqrt(1 - 1/xi);
model { target += - log(rho) + 2*log(1 - square(rho));

And now:

 mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
xi 6.33 2.48 93.92 1.00 1.07 1.36 2.25 17.34 1430 1
rho 0.50 0.01 0.29 0.02 0.26 0.51 0.75 0.97 1358 1
lp__ -1.60 0.03 0.91 -4.23 -1.83 -1.25 -1.02 -0.96 712 1

Yesssss! It works. rho has a uniform distribution, which was our goal.

The whole procedure, including several debugging steps not shown here, took about 10 minutes. (In contrast, it took half hour to write all this up.)

OK, we've succeeded in implementing the trivial. So what?

I'm not claiming this is some great accomplishment. Rather, this is the sort of thing we need to do sometimes. And so it's good to have a way to check our work.

Now that we've succeeded in the reparameterization, we can add a prior on rho directly. For example, if we want normal(0.7, 0.2), we just add it to the model block:

 rho ~ normal(0.7, 0.2);

Or, equivalently:

 target += normal(rho | 0.7, 0.2);

In either case, the constraint to the range [0, 1] is implicit, having already been achieved in the declaration step.

And now we can move on and continue with the rest of our modeling.


电子邮件地址不会被公开。 必填项已用*标注