Clustering
We can utilize the graph structure to extract clusters without needing any features. These clusters can be later interpreted with some prior knowledge.
Main Idea: There are many edges within clusters and few edges between clusters, leading to high intra-cluster density and low inter-cluster density.
Categories of Clustering Algorithms
- Partitioning Methods: Divide the graph into non-overlapping clusters.
- Non-Partitioning Methods: Allow clusters to overlap.
Partitioning Methods
Basic Idea: Formulate clustering as a constrained optimization problem.
k-Means with Different Distance Metrics
Given a graph \(G\) and an objective: - Define a set of clusters \(C = \{C_1, C_2, ..., C_k\}\). - Optimize \(G(C)\) subject to \(C_1 \cup C_2 \cup ... \cup C_k = V\) and \(C_1, C_2, ..., C_k\) are disjoint.
This problem is NP-hard due to its discrete nature. For \(k = 2\), polynomial-time algorithms exist.
Global Minimum Cut
Minimize the number of edges between clusters: - Objective: Partition the graph into two clusters such that the cut \((C_1, C_2) = \sum\) of edges between \(C_1\) and \(C_2\) is minimized. - The source-sink cut can be computed in polynomial time using the Ford-Fulkerson algorithm. - In graph clustering, no specific source and sink are given, and more than two clusters are needed.
Normalized Cut
Cuts tend to favor small clusters.
Ratio Cut: - Minimize \(\text{cut}(C_1, C_2) / |C_1| + \text{cut}(C_1, C_2) / |C_2|\). - Does not consider intra-cluster density.
Normalized Cut: - Minimize \(\text{cut}(C_1, C_2) / \text{vol}(C_1) + \text{cut}(C_1, C_2) / \text{vol}(C_2)\), where \(\text{vol}(C)\) is the sum of the degrees of all nodes in \(C\). - For more than two clusters, minimize \(\sum_{C_i} \text{cut}(C_i, V \setminus C_i) / \text{vol}(C_i)\). - This is NP-hard; approximate solutions use expectation maximization.
Graph Laplacian - Background
The Graph Laplacian \(L = D - W\): - \(D\) is the degree matrix. - \(W\) is the weighted adjacency matrix. - \(L\) is positive semi-definite and symmetric for undirected graphs.
Key Properties: - \(f^T L f = \sum_{i,j} W_{ij} (f_i - f_j)^2\) over edges. - The smallest eigenvalue is 0, with the corresponding eigenvector being the constant vector. - The number of connected components is given by the multiplicity of the 0 eigenvalue. - If the graph is connected, the second smallest eigenvalue (\(\lambda_2\)) is greater than 0. - The second smallest eigenvalue (Fiedler value) indicates the graph’s algebraic connectivity. - The Laplacian is additive under graph union: \(L(G_1 \cup G_2) = L(G_1) + L(G_2)\) if \(G_1\) and \(G_2\) are disjoint.
Normalized/Symmetric Laplacian: - \(L_{\text{sym}} = D^{-1/2} L D^{-1/2}\).
Computational Methods: - Spectral clustering involves solving an eigenvalue problem. - For \(k = 2\): - Define \(f\) such that \(f_i = +1\) if \(i \in C_1\) and \(-1\) if \(i \in C_2\). - \(f^T L f = 4 \cdot \text{cut}(C_1, C_2)\). - Minimize \(f^T L f\) subject to \(f^T f = \sqrt{n}\) and \(f^T 1 = 0\). - The solution is the second eigenvector of \(L\).
- For \(k > 2\):
- Indicator matrix \(H = [h_1, h_2, ..., h_k]\) where \(h_j = 1\) if \(i \in C_j\) and 0 otherwise.
- \(H^T H = I\).
- Minimize \(\text{trace}(H^T L H)\).
- The solution is the \(k\) smallest eigenvectors of \(L\).
- Spectral embedding for node \(1\) is the first entry of the \(k\) smallest eigenvectors of \(L\) (first row of \(H\)).
- Perform k-means on the spectral embedding to obtain cluster assignments.
If the weight between nodes in different clusters increases, the spectral embedding will converge.
Drawbacks: There is no guarantee that the approximation will be accurate in all cases, but it works well in practice.
Spectral Clustering for Numerical Data: - Requires a similarity function $ (x,y) = (-||x-y||^2 / 2 ^2)$. - Create an edge between \(x\) and \(y\) if \(\text{sim}(x,y) > t\) or if \(x\) and \(y\) are mutual \(k\)-nearest neighbors. - Apply spectral clustering. - Advantages: Works well for non-convex clusters and complex shapes (more effective than applying k-means directly).
Probabilistic Community Detection
Optimization Methods: - Find clustering that minimizes an objective function like min cut.
Generative Models: - Consider a graph as a realization from a generative model controlled by parameters. - Communities are explicitly modeled within the generative model (e.g., PPM, SBM). - Finding good clustering is equivalent to parameter inference.
Example: PPM (Plantinum Partition Model): - Two communities with a latent variable \(z\). - Probability \(P(A_{ij} = 1 \mid z_j, z_i) = p\) if \(z_i = z_j\) and \(q\) otherwise. - Maximize the posterior estimate \(z^* = \arg \max_z P(Z \mid A)\). - Assume \(C_1 = C_2 = n/2\) then \(\sum z_i = 0\) (indicator: +1 or -1).
We can show that \(P(Z \mid A) \propto P(A \mid Z) = \left( \frac{(1-q)p}{(1-p)q} \right)^{E_{\text{cut}}(z)}\) (if \(q > p\)), where \(E_{\text{cut}}(z) = \sum_{i<j} A_{ij} 1(z_i \ne z_j)\). Minimize the cut to maximize the posterior. This can be solved using spectral clustering for this specific case.
Example: SBM (Stochastic Block Model): - Skip detailed explanation. - Use variational inference to handle intractable posteriors and infer model parameters.