Hierarchical, Spectral, and Subspace Clustering
These are advanced clustering techniques used when K-means clustering is not sufficient, especially for complex, non-spherical, or high-dimensional datasets.
1. Hierarchical Clustering
Hierarchical clustering builds a nested tree of clusters called a dendrogram.
Types
-
Agglomerative (bottom-up): start with each point as its own cluster and repeatedly merge the closest clusters.
-
Divisive (top-down): start with all points together and recursively split them.
Linkage Methods
-
Single linkage
-
Complete linkage
-
Average linkage
-
Ward᎙s method
Advantages
-
No need to pre-specify the number of clusters
-
Produces a visual dendrogram
-
Useful for exploratory analysis
Limitations
-
Computationally expensive for large datasets
-
Sensitive to distance and linkage choices
Applications
-
Phylogenetic analysis
-
Gene-expression clustering
-
Document organization
2. Spectral Clustering
Spectral clustering uses graph theory and eigenvectors of a similarity matrix to reveal cluster structure.
Main Idea
-
Build a similarity graph between data points.
-
Compute the graph Laplacian matrix.
-
Extract leading eigenvectors.
-
Cluster the embedded points, often with K-means.
Strengths
-
Detects non-convex clusters
-
Effective when cluster boundaries are complex
Limitations
-
Requires construction of a similarity matrix
-
Computational cost can be high
Applications
-
Image segmentation
-
Social-network analysis
-
Manifold learning
3. Subspace Clustering
Subspace clustering identifies clusters that exist only within subsets of features, making it particularly useful in high-dimensional data.
Motivation
In datasets such as gene expression or text data, different clusters may be defined by different subsets of variables.
Approaches
-
CLIQUE
-
PROCLUS
-
Sparse Subspace Clustering
Advantages
-
Handles irrelevant features
-
Effective for very high-dimensional datasets
Limitations
-
More complex to design and tune
-
Interpretation may be challenging
Applications
-
Bioinformatics
-
Text mining
-
Recommender systems
Comparison Table
| Method | Best For | Key Output |
|---|---|---|
| Hierarchical | Exploratory analysis and nested structure | Dendrogram |
| Spectral | Nonlinear and graph-structured clusters | Low-dimensional embedding + clusters |
| Subspace | High-dimensional data with feature-specific clusters | Clusters with relevant feature subsets |
Summary
-
Hierarchical clustering builds a tree of nested clusters.
-
Spectral clustering uses graph eigenvectors to separate complex cluster shapes.
-
Subspace clustering finds clusters that are meaningful only in subsets of dimensions.
Together, these methods extend clustering beyond the assumptions of K-means and are valuable for modern high-dimensional and structured datasets.