Graph Classification - Semi-Supervised Learning

Tasks

A graph represents a social network
Labels are known for some nodes (labeled nodes)
Goal: Classify the remaining nodes
Standard assumption: Nodes with similar labels are likely to be connected (smoothness assumption or homophily)

\(V = S \cup U\)
- \(S\): Seed nodes
- \(U\): Unlabeled nodes
- \(W\): Symmetric weight adjacency matrix
- \(y_i\): in \(\{0,1\}\)
Idea: Energy minimization
- \[E = \sum W_{ij}(y_i - y_j)^2\]
- Smoothness: Adjacent nodes should have the same class
- Subject to \(y_i = \hat{y}_i\) for \(i \in S\)
Constrained integer optimization problem
- Rewritten as \(\min y^T L y\)
- \(y \in [0,1]\)
- \[L = \begin{bmatrix} L_{ss} & L_{su} \\ L_{us} & L_{uu} \end{bmatrix}\]
- \[y = \begin{bmatrix} y_s \\ y_u \end{bmatrix} = \begin{bmatrix} \hat{y}_s \\ y_u \end{bmatrix}\]
- \(y_u = -L_{uu}^{-1} L_{us} y_s\) (closed-form solution)
- Unconstrained optimization problem

\(y_{ik} = 1\) if node \(i\) is of class \(k\)
\[E(y) = \sum W_{ij}(y_i - y_j)^2\]
Compatibility matrix \(H\)
- Models the compatibility between labels
- \[E_y = \sum W_{ij}(y_i - y_j) H (y_i - y_j)\]
- Example: \[H = \begin{bmatrix} 0.2 & 0.8 \\ 0.8 & 0.2 \end{bmatrix}\]
  - Using heterophily: Nodes with different labels are likely to be connected (leaders and followers)

Labels \(y_i\) look like communities \(z_i\)
Compatibility matrix \(H\) resembles the SBM matrix
SBM:
- Can have observed labels
- Discriminative model: Models \(p(y|W)\)
- Generative model: Models \(p(W|z)\) and \(p(z)\)
- Used to generate new graphs (unlike LP)
- Estimates parameters that generated a given graph (unsupervised)
- Used for link prediction, etc.
LP:
- Predicts labels of \(U\) given observed labels and \(W\) (supervised)
- Different problem-solving approach

Label propagation is a special case of transductive learning
Labeled set \(T = \{(x_i, y_i)\}\) and an unlabeled set \(U = \{x_i\}\)
- May include additional structural information like \(A\)
- Goal: Predict labels of only the unlabeled set
- Easier task: Not solving the general problem for any new graph and any new labels
Vapnik Principle: Solve only the specific problem, not a more difficult intermediate step

LP is also graph-based semi-supervised learning (better termed SSP transductive learning)
Semi-supervised learning: Labeled set \(T = \{(x_i, y_i)\}\) and an unlabeled set \(U = \{x_i\}\)
- Goal: Use both \(T\) and \(U\) to learn a mapping \(f\)
- \(f\) can be transductive or inductive
Transductive learning is almost always semi-supervised because both \(T\) and \(U\) are used

Each node has a feature vector \(\mathbb{R}^n\)
Traditional methods: Markov random fields (similar to HMMs)
- Latent variables are not sequential but graph-structured
- Parts of the latent variables are observed
Modern methods: Graph Neural Networks (GNNs)

Handle vector data when no structure is available
- Construct a graph connecting similar data points (similarity function)
- Apply label propagation

Node classification: Classify individual nodes
Graph classification: Classify the entire graph as a single instance (standard iid problem)

Naive approaches:
- Adjacency matrix as an image: Not permutation equivariant
- Concatenating features of neighbors: Variable length, \(O(n^2)\) scaling
Learning node representations (like word embeddings)
- \(\phi: V \rightarrow \mathbb{R}^d\)
- Similar to graph clustering with \(d\) smallest eigenvectors of the Laplacian
- Close in embedding space \(\Rightarrow\) Close in graph space

Clustering: k-means
Semi-supervised classification: Classify remaining nodes based on learned embeddings
Link prediction: Measure likelihood of a link between nodes in embedding space

Role-based embeddings: Nodes with similar roles are close, even if far in the graph
Spectral embeddings: Nodes have similar embeddings if in the same cluster
Deep embeddings: Similarity depends on the loss function and architecture

\(L\) in \(V \times V\) projected to \(V \times k\) (smallest \(k\) eigenvectors of \(L\))
Linear transformation, similar to PCA

Inspired by NLP methods (e.g., word2vec)
Deep walk: Transform a graph into random walks and apply word2vec
- For every node \(v_i\), sample multiple random walks
- Random walks correspond to sentences (sequences of discrete tokens)
- Result: Nodes close in the graph are close in the embedding space