Problem of Overfitting
Overfitting is a major problem in Machine Learning where a model learns the training data too well, including its noise and unnecessary patterns, and fails to perform well on new or unseen data.
᪅ Definition
Overfitting occurs when a model shows:
- Very high accuracy on training data
- Low accuracy on testing/validation data
Why Overfitting is a Problem?
Because the model does not generalize well.
It becomes too specific to the training dataset and gives poor predictions for real-world inputs.
Symptoms of Overfitting
- Training error is very low
- Testing/validation error is high
- Model performs well only on known data
Causes of Overfitting
- Too complex model
- Deep neural networks, high-degree polynomial regression
- Small training dataset
- Too many features
- Training too long
- Noisy data
Example
If a student memorizes answers instead of understanding concepts, they will score well in practice but fail in a new exam.
Similarly, an overfitted model memorizes training data instead of learning general patterns.
How to Reduce Overfitting
- Use more training data
- Apply regularization (L1, L2)
- Use cross-validation
- Use feature selection
- Use dropout (in deep learning)
- Use early stopping
- Use pruning (decision trees)
