RECAP
What are Markov Models?
Remember Stationary Distributions ($\pi, P^\infty$)?
Hidden Markov Models
We now see observations, instead of true states
Reason about true state from the observations
Hidden Markov Models
Hidden Markov Models
True states are hidden
Hidden Markov Models
What kind of problems can we solve?
Voice Recognition (mapping to words)
Genetic sequence alignment
Predicting stock prices
Parts of speech tagging
Predicting weather
Hidden Markov Models
Imagine you are Sherlock Holmes
Figure out whether my train was:
Very Late
Late
On Time
... by observing whether I'm irritable or happy
Hidden Markov Models
Hidden Markov Models
\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} \] \[ B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]
Hidden Markov Models
3 types of reasoning
Probability of observing a sequence
Most likely sequence, given observations
Finding model parameters that best explain observations
Hidden Markov Models
Calculate the probability:
\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]
Hidden Markov Models
Calculate the probability:
\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]
Hidden Markov Models
Now, calculate the probability:
\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]
Required Computation
\[ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) \]
Clearly not a good idea!
Forward Algorithm
(Dynamic Programming)
Forward Algorithm
(Dynamic Programming)
\[ \alpha_t(S_i) = \sum_j \alpha_{t-1}(S_j) P(S_i|S_j) P(O^{(t)}|S_i) \]
\[ \alpha_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]
\[ P(O^{(1)}, O^{(2)}, \dots, O^{(t)}) = \sum_{i=1}^n \alpha_{t-1}(S_i) \]
Most likely sequence!
Calculating \[\argmax_X P(X|O) \]
Viterbi Algorithm
Viterbi Algorithm
\[ \delta_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]
\[ \delta_t(S_i) = \max_j \left[ \delta_{t-1}(S_j) P(S_i|S_j) P(O^t | S_i) \right] \]
Estimating the HMM Parameters
Expectation Maximization
Hill Descent