MACHINE LEARNING IN CYBER SECURITY

Hierarchical, Spectral, and Subspace Clustering

These are advanced clustering techniques used when K-means clustering is not sufficient, especially for complex, non-spherical, or high-dimensional datasets.

1. Hierarchical Clustering

Hierarchical clustering builds a nested tree of clusters called a dendrogram.

Types

Agglomerative (bottom-up): start with each point as its own cluster and repeatedly merge the closest clusters.
Divisive (top-down): start with all points together and recursively split them.

Linkage Methods

Single linkage
Complete linkage
Average linkage
Ward᧙s method

Advantages

No need to pre-specify the number of clusters
Produces a visual dendrogram
Useful for exploratory analysis

Limitations

Computationally expensive for large datasets
Sensitive to distance and linkage choices

Applications

Phylogenetic analysis
Gene-expression clustering
Document organization

2. Spectral Clustering

Spectral clustering uses graph theory and eigenvectors of a similarity matrix to reveal cluster structure.

Main Idea

Build a similarity graph between data points.
Compute the graph Laplacian matrix.
Extract leading eigenvectors.
Cluster the embedded points, often with K-means.

Strengths

Detects non-convex clusters
Effective when cluster boundaries are complex

Limitations

Requires construction of a similarity matrix
Computational cost can be high

Applications

Image segmentation
Social-network analysis
Manifold learning

3. Subspace Clustering

Subspace clustering identifies clusters that exist only within subsets of features, making it particularly useful in high-dimensional data.

Motivation

In datasets such as gene expression or text data, different clusters may be defined by different subsets of variables.

Approaches

CLIQUE
PROCLUS
Sparse Subspace Clustering

Advantages

Handles irrelevant features
Effective for very high-dimensional datasets

Limitations

More complex to design and tune
Interpretation may be challenging

Applications

Bioinformatics
Text mining
Recommender systems

Comparison Table

Method	Best For	Key Output
Hierarchical	Exploratory analysis and nested structure	Dendrogram
Spectral	Nonlinear and graph-structured clusters	Low-dimensional embedding + clusters
Subspace	High-dimensional data with feature-specific clusters	Clusters with relevant feature subsets

Summary

Hierarchical clustering builds a tree of nested clusters.
Spectral clustering uses graph eigenvectors to separate complex cluster shapes.
Subspace clustering finds clusters that are meaningful only in subsets of dimensions.

Together, these methods extend clustering beyond the assumptions of K-means and are valuable for modern high-dimensional and structured datasets.