Published

Tuesday, July 30, 2024

Graph Theory and Network Structure

Basic Definitions

Plain Simple Graph \(G = (V, E)\): - Set of nodes \(V\) - Set of edges $ E V V$ - For undirected graphs, \((u, v) = (v, u)\) - Equivalent representation: adjacency matrix \(A \in \{0, 1\}^{|V| \times |V|}\) - Extensions: weighted, attributed (vectors assigned to nodes or edges), temporal

Structure of Real Networks

Are Real Networks Random? - No, they are not random.

Erdős–Rényi Model: - Random graph model - Start with \(n\) nodes - For each pair of nodes, add an edge with probability \(p\) - Every edge is independent - No obvious pattern - Not a good model for real networks

Interpretation Challenges: - We cannot interpret the adjacency matrix directly nor plot it. - When plotting graphs, we use a layout algorithm that changes how the graph looks - graph isomorphism.

Patterns in Real Graphs

Diameter: - Very small, following the “six degrees of separation” concept.

In-degree and Out-degree Distribution: - Follows a power law. - Few nodes have many connections, and many nodes have few connections. - Log-log plot looks like a straight line. - The decay of the probability density function (pdf) is polynomial, not exponential (heavy-tailed): \[ p(x) = a \cdot x^{-b} \quad \text{where } b > 1 \] - Graphs with power law degree distribution are called .

Scale-free Property: - The number of nodes with degree \(x\) is denoted by \(y(x)\). - For a scale-free network, the degree distribution follows a power law: \[ y(x) \propto x^{-b} \] - Here, \(y(x)\) is the number of nodes with degree \(x\). - \(a\) and \(b\) are constants, with \(b > 1\).

Scale-free Property Equation: - The scale-free property suggests that the degree distribution is invariant under rescaling: \[ y(ax) = a^{-b} y(x) \] - This means that if you scale the degree by a factor \(a\), the number of nodes with that degree scales by a factor \(a^{-b}\).

Algorithm Design Considerations

  • We need to account for power law distributions when designing algorithms.
  • Gaussian assumptions often do not hold (Gaussian trap).

Clustering Coefficient: - Calculated as \(\frac{3 \times \text{number of triangles}}{\text{number of connected triples}}\) - Real graphs have a large clustering coefficient, indicating strong community structure.

Approximating the Clustering Coefficient

  • Plot the sorted eigenvalues of the adjacency matrix on a log-log plot.
  • Power law distribution: few large eigenvalues and many small ones.
  • Number of triangles: \[ \text{Number of triangles} = \frac{\text{trace}(A^3)}{6} = \frac{\sum (\text{eigenvalues}^3)}{6} \]
  • We only need the largest eigenvalues, which can be computed efficiently using the power iteration method:
    • Multiply a random vector with the matrix and normalize it.
    • Converges to the largest eigenvector.
    • Deflated matrix \(\hat{X} = X - \lambda_1 v_1 v_1^T\) (works only for symmetric matrices).

Sparsity of Real Graphs: - While there are \(N^2\) possible edges, the real number of edges is almost linear.

Path Lengths and Diameters

Characteristic Path Length: - Median of the average shortest path length between all pairs of nodes.

Average Diameter: - Average of the average shortest path length between all pairs of nodes.

Hop Plots: - \(N_h(u)\): Number of nodes reachable from \(u\) via \(h\) hops. - Total neighbors \(N_h\): Sum of all \(N_h(u)\) over all \(u\). - Plot \(N_h\) vs. \(h\).

Effective Diameter/Eccentricity: - Minimum number of hops in which a certain fraction (typically 90%) of all connected pairs of nodes can be reached: \[ \min \{ h \mid N_h \geq 0.9 \times \text{total number of reachable pairs} \} \]

Detecting Abnormal Behavior

  • Understanding what is normal helps in detecting abnormal behavior.
  • Use realistic graph generators to simulate networks.