K-Nearest Neighbour (KNN)
K-Nearest Neighbour (KNN) is a supervised machine learning algorithm used for both classification and regression. It is a lazy learning algorithm, meaning it does not build a model during training, but makes predictions only when required.
᪅ Working Principle
KNN works based on the idea that:
쮽쿉 Similar data points belong to the same class.
Steps in KNN Algorithm
- Choose the number of neighbors (K).
- Calculate the distance between the new data point and all training points.
- Select the K nearest data points.
- Predict output:
- Classification: Majority voting among neighbors
- Regression: Average value of neighbors
᪅ Distance Measures Used
Common distance metrics include:
1. Euclidean Distance
d=(x1ᖒy1)2+(x2ᖒy2)2+…d = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + …}
2. Manhattan Distance
d=ᖣx1ᖒy1ᖣ+ᖣx2ᖒy2ᖣ+…d = |x_1-y_1| + |x_2-y_2| + …
3. Minkowski Distance
Generalized form of Euclidean and Manhattan.
᪅ Advantages
- Simple and easy to understand
- No training phase required
- Works well for small datasets
- Effective for non-linear problems
ᫌ Disadvantages
- Computationally expensive for large datasets
- Requires feature scaling (normalization)
- Sensitive to noise and irrelevant features
- Performance depends heavily on choosing K
᪅ Applications
- Pattern recognition
- Image classification
- Recommendation systems
- Medical diagnosis
- Handwriting recognition
