Hidden Markov Model – SNS Courseware

MACHINE LEARNING IN CYBER SECURITY

Hidden Markov Model (HMM)

A Hidden Markov Model (HMM) is a statistical model used to represent systems where the actual states are not directly observable (hidden), but we can observe outputs that depend on those hidden states.

It is widely used for sequence prediction and time-series modeling.

✅ Key Idea

In an HMM:

The system moves through a sequence of hidden states
Each hidden state produces an observable output

Example:
In speech recognition, we cannot directly observe the speaker’s phoneme state (hidden), but we can observe sound signals (output).

✅ Components of HMM

An HMM consists of:

1. Hidden States (S)

States that cannot be observed directly.
Example: {Hot, Cold}

2. Observations (O)

Outputs that can be observed.
Example: {Ice cream count}

3. Transition Probability (A)

Probability of moving from one hidden state to another.

$A_{ij} = P(S_{t+1}=j | S_t=i)$

4. Emission Probability (B)

Probability of observing an output given a hidden state.

$B_j(k) = P(O_t=k | S_t=j)$

5. Initial State Probability (π)

Probability of starting in a particular hidden state.

$πi=P(S1=i)\pi_i = P(S_1=i)$

✅ Representation

An HMM is defined by:

$λ=(A,B,π)\lambda = (A, B, \pi)$

Where:

$A$ = Transition probability matrix
$B$ = Emission probability matrix
$π\pi$ = Initial probability vector

✅ Example

Weather is hidden (Hot/Cold), but we observe number of ice creams sold.

If weather is Hot, ice cream sales are high.
If weather is Cold, ice cream sales are low.

So, by observing sales, we can estimate the hidden weather state.

✅ Three Fundamental Problems of HMM

1. Evaluation Problem

Compute probability of observation sequence given the model.
Solved using Forward Algorithm.

2. Decoding Problem

Find the most likely hidden state sequence.
Solved using Viterbi Algorithm.

3. Learning Problem

Estimate model parameters (A, B, π) from data.
Solved using Baum-Welch Algorithm.

✅ Applications of HMM

Speech recognition
Handwriting recognition
Part-of-speech tagging in NLP
Bioinformatics (DNA sequence analysis)
Gesture recognition
Weather forecasting

✅ Advantages

Good for sequential/time-series data
Handles uncertainty effectively
Useful when states are not directly visible

❌ Disadvantages

Assumes Markov property (depends only on current state)
Difficult for very complex real-world systems
Requires large data for accurate estimation