Graph Theory and Network Structure

Basic Definitions

Plain Simple Graph $G = (V, E)$: - Set of nodes $V$ - Set of edges $ E V V$ - For undirected graphs, $(u, v) = (v, u)$ - Equivalent representation: adjacency matrix $A \in \{0, 1\}^{|V| \times |V|}$ - Extensions: weighted, attributed (vectors assigned to nodes or edges), temporal

Structure of Real Networks

Are Real Networks Random? - No, they are not random.

Erdős–Rényi Model: - Random graph model - Start with $n$ nodes - For each pair of nodes, add an edge with probability $p$ - Every edge is independent - No obvious pattern - Not a good model for real networks

Interpretation Challenges: - We cannot interpret the adjacency matrix directly nor plot it. - When plotting graphs, we use a layout algorithm that changes how the graph looks - graph isomorphism.

Patterns in Real Graphs

Diameter: - Very small, following the “six degrees of separation” concept.

In-degree and Out-degree Distribution: - Follows a power law. - Few nodes have many connections, and many nodes have few connections. - Log-log plot looks like a straight line. - The decay of the probability density function (pdf) is polynomial, not exponential (heavy-tailed): \[ p(x) = a \cdot x^{-b} \quad \text{where } b > 1 \] - Graphs with power law degree distribution are called .

Scale-free Property: - The number of nodes with degree $x$ is denoted by $y(x)$. - For a scale-free network, the degree distribution follows a power law: \[ y(x) \propto x^{-b} \] - Here, $y(x)$ is the number of nodes with degree $x$. - $a$ and $b$ are constants, with $b > 1$.

Scale-free Property Equation: - The scale-free property suggests that the degree distribution is invariant under rescaling: \[ y(ax) = a^{-b} y(x) \] - This means that if you scale the degree by a factor $a$, the number of nodes with that degree scales by a factor $a^{-b}$.

Algorithm Design Considerations

We need to account for power law distributions when designing algorithms.
Gaussian assumptions often do not hold (Gaussian trap).

Clustering Coefficient: - Calculated as $\frac{3 \times \text{number of triangles}}{\text{number of connected triples}}$ - Real graphs have a large clustering coefficient, indicating strong community structure.

Approximating the Clustering Coefficient

Plot the sorted eigenvalues of the adjacency matrix on a log-log plot.
Power law distribution: few large eigenvalues and many small ones.
Number of triangles: \[ \text{Number of triangles} = \frac{\text{trace}(A^3)}{6} = \frac{\sum (\text{eigenvalues}^3)}{6} \]
We only need the largest eigenvalues, which can be computed efficiently using the power iteration method:
- Multiply a random vector with the matrix and normalize it.
- Converges to the largest eigenvector.
- Deflated matrix $\hat{X} = X - \lambda_1 v_1 v_1^T$ (works only for symmetric matrices).

Sparsity of Real Graphs: - While there are $N^2$ possible edges, the real number of edges is almost linear.

Path Lengths and Diameters

Characteristic Path Length: - Median of the average shortest path length between all pairs of nodes.

Average Diameter: - Average of the average shortest path length between all pairs of nodes.

Hop Plots: - $N_h(u)$: Number of nodes reachable from $u$ via $h$ hops. - Total neighbors $N_h$: Sum of all $N_h(u)$ over all $u$. - Plot $N_h$ vs. $h$.

Effective Diameter/Eccentricity: - Minimum number of hops in which a certain fraction (typically 90%) of all connected pairs of nodes can be reached: \[ \min \{ h \mid N_h \geq 0.9 \times \text{total number of reachable pairs} \} \]

Detecting Abnormal Behavior

Understanding what is normal helps in detecting abnormal behavior.
Use realistic graph generators to simulate networks.