Problem of Over fitting – SNS Courseware

MACHINE LEARNING IN CYBER SECURITY

Problem of Overfitting

Overfitting is a major problem in Machine Learning where a model learns the training data too well, including its noise and unnecessary patterns, and fails to perform well on new or unseen data.

᪅ Definition

Overfitting occurs when a model shows:

Very high accuracy on training data
Low accuracy on testing/validation data

Why Overfitting is a Problem?

Because the model does not generalize well.
It becomes too specific to the training dataset and gives poor predictions for real-world inputs.

Symptoms of Overfitting

Training error is very low
Testing/validation error is high
Model performs well only on known data

Causes of Overfitting

Too complex model
- Deep neural networks, high-degree polynomial regression
Small training dataset
Too many features
Training too long
Noisy data

Example

If a student memorizes answers instead of understanding concepts, they will score well in practice but fail in a new exam.
Similarly, an overfitted model memorizes training data instead of learning general patterns.

How to Reduce Overfitting

Use more training data
Apply regularization (L1, L2)
Use cross-validation
Use feature selection
Use dropout (in deep learning)
Use early stopping
Use pruning (decision trees)