Markov Chains

A Markov Chain is a sequence of random variables \(X_1, X_2, \ldots, X_t\) that fulfills the Markov property: \[ P(X_{t+1} \mid X_t, X_{t-1}, \ldots, X_1) = P(X_{t+1} \mid X_t) \]

Discrete Values

The time indices are discrete.
The variables \(X_t\) are discrete.

Joint Distribution

The joint distribution is given by: \[ P(X_1, X_2, \ldots, X_t) = P(X_1) P(X_2 \mid X_1) P(X_3 \mid X_2) \ldots P(X_t \mid X_{t-1}) \]

General Distribution

In general, the distribution for each random variable can be different: \[ P(x_1 = i) = \pi_i \] \[ P(X_{t+1} = j \mid X_t = i) = P_{ij}^{t+1} \] where \(\pi \in \mathbb{R}^k\) is the initial distribution and \(P_{ij}^t\) is a \(K \times K\) matrix.

Joint Distribution Simplified

The joint distribution can be expressed as: \[ P(X_1, X_2, \ldots, X_t) = \pi_1 P_1^t P_2^t \ldots P_t^t \] The number of parameters required is \(K(K-1)(t-1) + K-1\), which is too many for large \(t\).

Stationarity

For a stationary process: \[ P(X_{t+1} = j \mid X_t = i) = P_{ij} \] The transition matrix remains the same for all \(t\) (Time Homogeneity).

The number of parameters required is \(K(K-1) + K-1\).

Interpretation

A time-homogeneous discrete Markov Chain can be interpreted as a state machine or a graph.

Learning

Maximum Likelihood Estimation (MLE)

For \(n\) independent sequences: \[ P(\text{all}) = \prod_{n=1}^{N} P(X_1^n) \prod_{t=1}^{T} P(X_{t+1}^n \mid X_t^n) = \prod_{n=1}^{N} \prod_{i=1}^{K} \pi_i^{L(i)} \prod_{i=1}^{K} \prod_{j=1}^{K} P_{ij}^{L(i,j)} \]
Where:
- \(L(i) = \sum_{n=1}^{N} 1(X_1^n = i)\)
- \(L(i,j) = \sum_{n=1}^{N} \sum_{t=1}^{T-1} 1(X_t^n = i, X_{t+1}^n = j)\)

Log-Likelihood

\[ \log(P(\text{all})) = \sum_{i=1}^{K} L(i) \log(\pi_i) + \sum_{i=1}^{K} \sum_{j=1}^{K} L(i,j) \log(P_{ij}) \]

Maximizing with constraints: - \(\sum \pi_i = 1\) - \(\sum_j P_{ij} = 1\)

Transition Probabilities

\[ A_{ij} = \frac{L(i,j)}{\sum_{j=1}^{K} L(i,j)} \] \[ \pi_i = \frac{L(i)}{\sum_{i=1}^{K} L(i)} \]

\(n\)-Step Transition Probabilities

\[ A_{ij}(n) = A_{ij}^n \]

Chapman-Kolmogorov Equation

\[ P_{ij}^{t+s} = \sum_{k=1}^{K} P_{ik}^t P_{kj}^s \] \[ A(t+s) = A(t)A(s) \]

State Probabilities Over Time

\(\pi(t)\) is the probability of being in state \(i\) at time \(t\) from the initial distribution: \[ \pi(t) = P(X_t = i) = \sum_{j=1}^{K} P(X_t = i \mid X_{t-1} = j) P(X_{t-1} = j) = P_{ij} \pi(t-1) \] \[ \pi(t) = \pi A^{t-1} \]