K-Nearest Neighbour – SNS Courseware

MACHINE LEARNING IN CYBER SECURITY

K-Nearest Neighbour (KNN)

K-Nearest Neighbour (KNN) is a supervised machine learning algorithm used for both classification and regression. It is a lazy learning algorithm, meaning it does not build a model during training, but makes predictions only when required.

᪅ Working Principle

KNN works based on the idea that:

쮽쿉 Similar data points belong to the same class.

Steps in KNN Algorithm

Choose the number of neighbors (K).
Calculate the distance between the new data point and all training points.
Select the K nearest data points.
Predict output:
- Classification: Majority voting among neighbors
- Regression: Average value of neighbors

᪅ Distance Measures Used

Common distance metrics include:

1. Euclidean Distance

$\sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + …}$

2. Manhattan Distance

$|x_1-y_1| + |x_2-y_2| + …$

3. Minkowski Distance

Generalized form of Euclidean and Manhattan.

᪅ Advantages

Simple and easy to understand
No training phase required
Works well for small datasets
Effective for non-linear problems

ᫌ Disadvantages

Computationally expensive for large datasets
Requires feature scaling (normalization)
Sensitive to noise and irrelevant features
Performance depends heavily on choosing K

᪅ Applications

Pattern recognition
Image classification
Recommendation systems
Medical diagnosis
Handwriting recognition