Maximum Likelihood Estimation (MLE) and Bayesian Estimation
Both Maximum Likelihood Estimation (MLE) and Bayesian Estimation are widely used techniques for estimating unknown parameters of a statistical model. They differ in how they treat the parameters.
1. Maximum Likelihood Estimation (MLE)
Definition
MLE chooses the parameter value that makes the observed data most probable.
If the unknown parameter is (\theta) and the observed data are (x_1, x_2, \dots, x_n), the likelihood function is
[
L(\theta) = P(x_1, x_2, \dots, x_n \mid \theta)
]
The MLE estimate is the value of (\theta) that maximizes this likelihood:
\hat{\theta}{MLE} = \arg\max{\theta} L(\theta)
Example: Estimating the Mean of a Normal Distribution
Suppose:
-
(X_1, X_2, \dots, X_n \sim N(\mu, \sigma^2))
-
Variance (\sigma^2) is known
-
Mean (\mu) is unknown
The MLE of the mean is:
[
\hat{\mu}_{MLE} = \bar{X}
]
where (\bar{X}) is the sample mean.
Advantages of MLE
-
Simple and widely applicable
-
Consistent as sample size increases
-
Often computationally efficient
Limitations of MLE
-
Does not incorporate prior information
-
Can be unstable with very small datasets
2. Bayesian Estimation
Definition
Bayesian estimation treats the parameter (\theta) as a random variable and combines:
-
Prior distribution (P(\theta))
-
Likelihood (P(D \mid \theta))
Using Bayes’ theorem:
P(\theta \mid D) = \frac{P(D \mid \theta) P(\theta)}{P(D)}
where (P(\theta \mid D)) is the posterior distribution.
Bayesian Point Estimates
-
Posterior Mean
-
Posterior Median
-
MAP (Maximum A Posteriori) estimate:
[
\hat{\theta}_{MAP}
\arg\max_{\theta} P(\theta \mid D)
]
MAP is similar to MLE but includes prior knowledge.
Advantages of Bayesian Estimation
-
Incorporates prior information
-
Provides full uncertainty estimates
-
Performs well with limited data
Limitations of Bayesian Estimation
-
Requires choosing a prior
-
Can be computationally intensive
3. Key Differences Between MLE and Bayesian Estimation
| Aspect | MLE | Bayesian Estimation |
|---|---|---|
| Parameter Treatment | Fixed but unknown | Random variable |
| Uses Prior Information | No | Yes |
| Output | Single best estimate | Posterior distribution |
| Uncertainty Quantification | Limited | Naturally provided |
| Small Sample Performance | May be unstable | Often more robust |
4. Relationship Between MLE and MAP
If the prior distribution is uniform (all parameter values equally likely), then:
[
\hat{\theta}{MAP} = \hat{\theta}{MLE}
]
Thus, MLE is a special case of Bayesian estimation.
5. Applications
-
Machine Learning
-
Econometrics
-
Signal processing
-
Bioinformatics
-
Medical diagnosis
Summary
-
MLE estimates the parameter that maximizes the likelihood of the observed data.
-
Bayesian estimation combines prior knowledge with observed data to produce a posterior distribution.
-
MAP is the Bayesian counterpart most directly comparable to MLE.
-
With a uniform prior, MAP and MLE are identical.