Time Series Segmentation with Hidden Markov Models
by Daniel Kohlsdorf
dkohlsdorf@gmail.com
Introduction
This is an interactive demo visualizing algorithms for hidden Markov models on
continuous time series. A hidden Markov model is a probabilistic model for time series.
In other words, a hidden Markov model is an abstract representation of a time series:
X = {x1 ... xT} . The model can be regarded as a probabilistic state machine with
a finite number of states: S = {s1 ... 1N} . Transitions are defined by a probabilistic transition
function P(s|s') which indicates the probability of switching from state s
to state s' . It also defines a observation function P(x|s) that indicates
how probable a state is given an observation from the time series.
There are two main questions that arise given a model and a time series:
Decoding: Given my time series what is the most probable path through a hidden Markov model.
Parameter Estimation: Given a time series (or multiple time series) what is the maximum
likelihood parametrization.
Decoding can be solved using the Viterbi recursion and parameter estimation can be solved using the Baum Welch
algorithm. Pointers to references can be found below and I encourage the reader to follow at least through
half of the Rabiner paper.
However, what I want to discuss here is how hidden Markov models can be used to segment a time series, which in
turn will give insight in how hidden Markov models work. The high level idea is that we can estimate the
parameters of a hidden Markov model for a time series and then find the maximum path through the estimated model
for the same time series. If we backtrack the path, will get a sequence of states. Each state on the path
indicates an assignment from a sample from the time series X into the state space S.
While speech recognition is a supervised task often solved with hidden Markov models, the algorithms
underneath solve an unsupervised problem: segmentation. The following interactive demo shows
segmentation for continuous time series with a simplified one dimensional hidden Markov model.
Demo
Litte tasks
If we do not train the model (number of training iterations is 0) and we
assume a 3 state model with all transition probabilities to the same state: P(s_i|s_i) = 0.9 and
we assume all forward transitions: P(s_i|s_i+1) = 0.1 why do we still get a
segmentation for the time series: X = { 1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3}
when decoding? Using the knowledge from the literature,
explain the segmentation.
For the time series X = {1,1,1,1,3,3,3,3,3,3,1,1,1,1,1} and a three state model what is the probability to stay in
state 1, in state 2 and state 3 after training? Why do we see
different results from the tool?
Our implementation starts with all states being probability
0. Is that a good initialization? If not change the code to
do better. Read the htk book for that.
Can you change all the computations into log space? [code]