Foundations of AI - Rajagopal Venkat

RECAP

What are Markov Models?

Remember Stationary Distributions ($\pi, P^\infty$)?

Hidden Markov Models

We now see observations, instead of true states

Reason about true state from the observations

Hidden Markov Models

True states are hidden

Hidden Markov Models

What kind of problems can we solve?

Voice Recognition (mapping to words)

Genetic sequence alignment

Predicting stock prices

Parts of speech tagging

Predicting weather

Hidden Markov Models

Imagine you are Sherlock Holmes

Figure out whether my train was:

Very Late

Late

On Time

... by observing whether I'm irritable or happy

Hidden Markov Models

\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} \] \[ B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]

Hidden Markov Models

3 types of reasoning

Probability of observing a sequence

Most likely sequence, given observations

Finding model parameters that best explain observations

Hidden Markov Models

Calculate the probability:

\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]

Hidden Markov Models

Calculate the probability:

\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]

Hidden Markov Models

Now, calculate the probability:

\[ P = \begin{bmatrix} 0.1 & 0.3 & 0.6\\ 0.4 & 0.2 & 0.4\\ 0.1 & 0.1 & 0.8 \end{bmatrix} B = \begin{bmatrix} 0.4 & 0.6\\ 0.5 & 0.5\\ 0.9 & 0.1 \end{bmatrix} \]

Required Computation

\[ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(VL|VL)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(L|VL)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(VL)\ P(Happy|VL)\ P(OT|VL)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(VL|L)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(L|L)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(L)\ P(Happy|L)\ P(OT|L)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(VL|VL)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(L|VL)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(VL|OT)\ P(Sad|VL)\ P(OT|VL)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(VL|L)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(L|L)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(L|OT)\ P(Sad|L)\ P(OT|L)\ P(Happy|OT) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(VL|OT)\ P(Happy|VL) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(L|OT)\ P(Happy|L) + \\ P(OT)\ P(Happy|OT)\ P(OT|OT)\ P(Sad|OT)\ P(OT|OT)\ P(Happy|OT) \]

Clearly not a good idea!

Forward Algorithm
(Dynamic Programming)

\[ \alpha_t(S_i) = \sum_j \alpha_{t-1}(S_j) P(S_i|S_j) P(O^{(t)}|S_i) \]

\[ \alpha_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]

\[ P(O^{(1)}, O^{(2)}, \dots, O^{(t)}) = \sum_{i=1}^n \alpha_{t-1}(S_i) \]

Most likely sequence!

Calculating \[\argmax_X P(X|O) \]

Viterbi Algorithm

\[ \delta_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]

\[ \delta_t(S_i) = \max_j \left[ \delta_{t-1}(S_j) P(S_i|S_j) P(O^t | S_i) \right] \]

Estimating the HMM Parameters

Expectation Maximization

Hill Descent

RECAP

What are Markov Models?

Remember Stationary Distributions ($\pi, P^\infty$)?

Hidden Markov Models

We now see observations, instead of true states

Reason about true state from the observations

Hidden Markov Models

Hidden Markov Models

True states are hidden

Hidden Markov Models

What kind of problems can we solve?

Voice Recognition (mapping to words)

Genetic sequence alignment

Predicting stock prices

Parts of speech tagging

Predicting weather

Hidden Markov Models

Imagine you are Sherlock Holmes

Figure out whether my train was: Very Late Late On Time

... by observing whether I'm irritable or happy

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

3 types of reasoning

Probability of observing a sequence

Most likely sequence, given observations

Finding model parameters that best explain observations

Hidden Markov Models

Calculate the probability:

Hidden Markov Models

Calculate the probability:

Hidden Markov Models

Now, calculate the probability:

Required Computation

Clearly not a good idea!

Forward Algorithm(Dynamic Programming)

Forward Algorithm(Dynamic Programming)

\[ \alpha_t(S_i) = \sum_j \alpha_{t-1}(S_j) P(S_i|S_j) P(O^{(t)}|S_i) \]

\[ \alpha_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]

\[ P(O^{(1)}, O^{(2)}, \dots, O^{(t)}) = \sum_{i=1}^n \alpha_{t-1}(S_i) \]

Most likely sequence!

Calculating \[\argmax_X P(X|O) \]

Viterbi Algorithm

Viterbi Algorithm

\[ \delta_1(S_i) = \pi[S_i] P(O^{(1)} | S_i) \]

\[ \delta_t(S_i) = \max_j \left[ \delta_{t-1}(S_j) P(S_i|S_j) P(O^t | S_i) \right] \]

Estimating the HMM Parameters

Expectation Maximization

Hill Descent

Figure out whether my train was:

Very Late

Late

On Time

Forward Algorithm
(Dynamic Programming)

Forward Algorithm
(Dynamic Programming)