Source: SSC 389: AR(1) + noise
Definitions
- \(X_t\) is hidden state at time t
- \(Y_t\) is observed state at time t
- \(D_t\) is information known at time t, i.e. \( \{ Y_1, \cdots, Y_t \} \)
- \(T\) is length of \(X, Y, D\)
Model
Hidden: \[ X_t = \alpha + \beta X_{t-1} + \epsilon \]
- n.b. X is stationary
Observed: \[ Y_t = X_t + v \]
\[ v \propto N(0, \sigma^2), \epsilon \propto N(0, \omega^2) \]
- \( v, \epsilon \) independent
The Goal
Goal: Describe a gibbs sampler for viewing the posterior of all unknowns given only the observed Y: \[ P( X, \alpha, \beta, \omega^2, \sigma^2 \mid D_T) \]
-
Draw \( (\alpha, \beta, \omega^2, \sigma^2) \) from \( P( \alpha, \beta, \omega^2, \sigma^2 \mid X, D_T) \)
-
Draw X from \( P(X \mid \alpha, \beta, \omega^2, \sigma^2, D_T) \)
Step 1: AR(1) without noise
“Should be easy, right?”
Hint: We want to sample from the posterior of the AR(1) without noise case, i.e. the posterior in bayesian linear regression of Y on X:
\[ Y = X \theta + \epsilon \]
where \( X = X_{0:t-1} \), \( Y = Y_{1:t} \), \( \theta = \{ \alpha, \beta \} \), and \( \epsilon \propto N(0, \sigma^2) \).
After setting conjugate priors on your hyperparamers \( ( \theta, \sigma^2 ) \), you will need to find the full conditional distributions and then gibbs sample to draw from the posterior:
a.
\[ P( \theta \mid \omega^2, \sigma^2, X, Y) \]
b.
\[ P( \sigma^2 \mid \theta, X, Y ) \]
And then drawing for \(\omega^2\) is similar.
Step 2: FFBS
Let \( A:= \{ \alpha, \beta, \omega^2, \sigma^2 \} \)
\( P( X \mid A, D_T) \) is the product of \( P( X_T \mid A, D_T) \) and \( P( X_t \mid X_{t+1:T}, A, D_T) \) for all t<T
a.
\[ P( X_T \mid A, D_T) \propto N(m_t, C_t) \]
b.
\[ P( X_t \mid X_{t+1:T}, A, D_T ) \]
\[ P( X_t \mid X_{t+1}, A, D_T ) = N(h_t, B_t) \]
So to draw from this distribution, first we draw \( X_T \) from \( P( X_T \mid A, D_T ) \) , then for each i<T we should be able to sample \( X_{T-i} \) from
\( P( X_{T-i} \mid X_{T-i+1}, A, D_{T-i} ) \)
giving us a draw from the joint distribution of \( P( X \mid A, D_T ) \) .
Step 2a: Forward probabilities
First we want to find the posterior at the last time point T:
\[ P( X_T \mid D_T) \propto N( m_t, C_t) \]
What is \( (m_t, C_t) \)?
Posterior at t-1:
\[ X_{t-1} \mid D_{t-1} \propto N( m_{t-1}, C_{t-1} ) \]
- \( m_0, C_0 \) are known (?).
Prior at t-1:
\[ X_t \mid D_{t-1} \propto N(a_t, R_t) \]
- \( a_t := \alpha + \beta m_{t−1} \)
- \( R_t := \beta^2 C_t + \omega^2 \)
Predictive at t-1:
\[ Y_t \mid D_{t-1} \propto N(f_t, Q_t) \]
- \( f_t := a_t \)
- \( Q_t := R_t + \sigma^2 \)
Posterior at t:
\[ X_t \mid D_t \propto N(m_t, C_t) \]
- \( m_t := a_t + A_t e_t \)
- \( C_t := R_t − R_t A_t \)
- \( A_t := R_t/Q_t \)
- \( e_t := y_t − f_t \)
Thus to calculate \( (m_t, C_t) \) we need to calculate the forward probabilities by beginning at t=0 and going up until we get to t=T. Then we draw \( X_T \) from \( N(m_t, C_t) \).
Step 2b: Backward probabilities
At this point we already know \( X_T \), so we now want to draw from \( P( X_{T-1} \mid X_T, A, D_{T-1} ) = N( h_{t-1}, B_{t-1} ) \), where:
-
\( h_{t-i} := m_{t-i} + \beta C_{t-i} / Y_{t-i+1} ( X_{t-i+1} − a_{t-i} ) \)
-
\( B_{t-i} := C_{t-i} − \beta^2 C_{t-i}^2 / Y_{t-i+1} \)
We just calculated all these things in the previous step. So we simply need to start at i=T-1 and go until i=1.