Time Series Segmentation with Hidden Markov Models

by Daniel Kohlsdorf

dkohlsdorf@gmail.com

Introduction

This is an interactive demo visualizing algorithms for hidden Markov models on continuous time series. A hidden Markov model is a probabilistic model for time series. In other words, a hidden Markov model is an abstract representation of a time series: X = {x1 ... xT} . The model can be regarded as a probabilistic state machine with a finite number of states: S = {s1 ... 1N} . Transitions are defined by a probabilistic transition function P(s|s') which indicates the probability of switching from state s to state s' . It also defines a observation function P(x|s) that indicates how probable a state is given an observation from the time series. There are two main questions that arise given a model and a time series:

Decoding: Given my time series what is the most probable path through a hidden Markov model.
Parameter Estimation: Given a time series (or multiple time series) what is the maximum likelihood parametrization.

Decoding can be solved using the Viterbi recursion and parameter estimation can be solved using the Baum Welch algorithm. Pointers to references can be found below and I encourage the reader to follow at least through half of the Rabiner paper. However, what I want to discuss here is how hidden Markov models can be used to segment a time series, which in turn will give insight in how hidden Markov models work. The high level idea is that we can estimate the parameters of a hidden Markov model for a time series and then find the maximum path through the estimated model for the same time series. If we backtrack the path, will get a sequence of states. Each state on the path indicates an assignment from a sample from the time series X into the state space S. While speech recognition is a supervised task often solved with hidden Markov models, the algorithms underneath solve an unsupervised problem: segmentation. The following interactive demo shows segmentation for continuous time series with a simplified one dimensional hidden Markov model.

Demo

Litte tasks

If we do not train the model (number of training iterations is 0) and we assume a 3 state model with all transition probabilities to the same state: P(s_i|s_i) = 0.9 and we assume all forward transitions: P(s_i|s_i+1) = 0.1 why do we still get a segmentation for the time series: X = { 1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3} when decoding? Using the knowledge from the literature, explain the segmentation.
For the time series X = {1,1,1,1,3,3,3,3,3,3,1,1,1,1,1} and a three state model what is the probability to stay in state 1, in state 2 and state 3 after training? Why do we see different results from the tool?
Our implementation starts with all states being probability 0. Is that a good initialization? If not change the code to do better. Read the htk book for that.
Can you change all the computations into log space? [code]

References

The standard paper for hidden Markov models: Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition
The classic textbook for probabilistic machine learning: Bishop: pattern recognition and machine learning
A proactical hidden Markov model book for speech recognition: The HTK3 book