MACHINE LEARNING IN CYBER SECURITY

Maximum Likelihood Estimation (MLE) and Bayesian Estimation

Both Maximum Likelihood Estimation (MLE) and Bayesian Estimation are widely used techniques for estimating unknown parameters of a statistical model. They differ in how they treat the parameters.

1. Maximum Likelihood Estimation (MLE)

Definition

MLE chooses the parameter value that makes the observed data most probable.

If the unknown parameter is (\theta) and the observed data are (x_1, x_2, \dots, x_n), the likelihood function is

[
L(\theta) = P(x_1, x_2, \dots, x_n \mid \theta)
]

The MLE estimate is the value of (\theta) that maximizes this likelihood:

\hat{\theta}{MLE} = \arg\max{\theta} L(\theta)

Example: Estimating the Mean of a Normal Distribution

Suppose:

(X_1, X_2, \dots, X_n \sim N(\mu, \sigma^2))
Variance (\sigma^2) is known
Mean (\mu) is unknown

The MLE of the mean is:

[
\hat{\mu}_{MLE} = \bar{X}
]

where (\bar{X}) is the sample mean.

Advantages of MLE

Simple and widely applicable
Consistent as sample size increases
Often computationally efficient

Limitations of MLE

Does not incorporate prior information
Can be unstable with very small datasets

2. Bayesian Estimation

Definition

Bayesian estimation treats the parameter (\theta) as a random variable and combines:

Prior distribution (P(\theta))
Likelihood (P(D \mid \theta))

Using Bayes’ theorem:

P(\theta \mid D) = \frac{P(D \mid \theta) P(\theta)}{P(D)}

where (P(\theta \mid D)) is the posterior distribution.

Bayesian Point Estimates

Posterior Mean
Posterior Median
MAP (Maximum A Posteriori) estimate:

[
\hat{\theta}_{MAP}

\arg\max_{\theta} P(\theta \mid D)
]

MAP is similar to MLE but includes prior knowledge.

Advantages of Bayesian Estimation

Incorporates prior information
Provides full uncertainty estimates
Performs well with limited data

Limitations of Bayesian Estimation

Requires choosing a prior
Can be computationally intensive

3. Key Differences Between MLE and Bayesian Estimation

Aspect	MLE	Bayesian Estimation
Parameter Treatment	Fixed but unknown	Random variable
Uses Prior Information	No	Yes
Output	Single best estimate	Posterior distribution
Uncertainty Quantification	Limited	Naturally provided
Small Sample Performance	May be unstable	Often more robust

4. Relationship Between MLE and MAP

If the prior distribution is uniform (all parameter values equally likely), then:

[
\hat{\theta}{MAP} = \hat{\theta}{MLE}
]

Thus, MLE is a special case of Bayesian estimation.

5. Applications

Machine Learning
Econometrics
Signal processing
Bioinformatics
Medical diagnosis

Summary

MLE estimates the parameter that maximizes the likelihood of the observed data.
Bayesian estimation combines prior knowledge with observed data to produce a posterior distribution.
MAP is the Bayesian counterpart most directly comparable to MLE.
With a uniform prior, MAP and MLE are identical.

Maximum Likelihood Estimation (MLE) and Bayesian Estimation

1. Maximum Likelihood Estimation (MLE)

Definition

Example: Estimating the Mean of a Normal Distribution

Advantages of MLE

Limitations of MLE

2. Bayesian Estimation

Definition

Bayesian Point Estimates

[\hat{\theta}_{MAP}

Advantages of Bayesian Estimation

Limitations of Bayesian Estimation

3. Key Differences Between MLE and Bayesian Estimation

4. Relationship Between MLE and MAP

5. Applications

Summary

[
\hat{\theta}_{MAP}