Shannon’s Information Theory

Published

Tuesday, March 25, 2025

THE MATHEMATICAL THEORY OF COMMUNICATION SHANNON & WEAVER

Original Text

Download Shannon & Weaver (1963)

Mathematical Modeling of Communication
- A communication system is represented by a sequence of blocks: an information source produces messages; a transmitter (encoder) converts these into signals; the channel (which may add noise) carries the signal; and a receiver (decoder) recovers the message for the destination.
- The analysis applies both to discrete systems (symbols, letters, digital signals) and to continuous systems (speech, television signals).
Definition of Information and Entropy
- Information is defined not as meaning but as the measure of uncertainty or freedom of choice in selecting a message.
- In its simplest form, if a message is chosen from \(n\) equally likely alternatives, its information is measured as \[ H = \log_2{n} \quad \text{(bits)}. \]
- For a random variable with probabilities \(p_i\), Shannon introduced the entropy \[ H = - \sum_{i} p_i \log_2 p_i, \] which quantitatively characterizes the average uncertainty (or information per symbol).
Relationship to Thermodynamic Entropy
- The appearance of an entropy-like measure in communication theory draws an analogy with thermodynamic entropy. Just as in statistical mechanics where entropy quantifies disorder, in information theory \(H\) quantifies the statistical “spread” in message possibilities.
Coding and Data Compression
- Efficient Coding:
  - The transmitter is allowed to encode the source output to improve transmission efficiency by matching the statistical structure of the message to the channel’s characteristics.
  - The redundancy of a language (e.g., roughly 50% for English) quantifies the difference between the actual entropy and the maximum possible entropy for a given symbol set. This redundancy can be exploited for compression.
- Fundamental Code Theorem for a Noiseless Channel:
  - If a source produces symbols with entropy \(H\) (bits per symbol) and the channel has capacity \(C\) (bits per second), then there exists a coding scheme that transmits the source output reliably at an average rate of nearly \[ \frac{C}{H} \quad \text{symbols per second}. \]
  - No encoding method can reliably achieve a rate exceeding \(\frac{C}{H}\).
Channel Capacity and Noise
- Noisy Channel Analysis:
  - In the presence of noise the received signal is perturbed, leading to spurious uncertainty. Shannon introduced the notions of “equivocation” (the uncertainty about the transmitted message given the received signal) and defined the channel capacity \(C\) as the maximum rate for which the transmitted useful information (total uncertainty minus the noise-induced uncertainty) can be made arbitrarily small.
  - A key relationship is: \[ H(X) - H_{Y}(X) = H(Y) - H_{X}(Y), \] where:
    - \(H(X)\) is the source entropy,
    - \(H(Y)\) is the entropy of the received signals,
    - \(H_{Y}(X)\) (equivocation) quantifies the residual uncertainty about the message after reception.
- Fundamental Theorem for a Noisy Channel:
  - If the channel capacity \(C\) exceeds the source entropy \(H(X)\), then there exists a coding scheme enabling error probabilities to be made arbitrarily small.
  - However, if \(H(X) > C\), no coding scheme can reduce the error frequency to an arbitrarily low level.
- Continuous Channels:
  - For continuous, band-limited channels affected by white thermal noise, Shannon derived a capacity expression (in one typical form) \[ C = W \log_2\left(1 + \frac{P}{N}\right) \quad \text{(bits per second)}, \] where:
    - \(W\) is the bandwidth,
    - \(P\) is the average signal power, and
    - \(N\) is the noise power.
  - This result shows that even continuous signals can be characterized by a finite number of degrees of freedom (e.g., by sampling at \(2W\) points per second).
Role of Statistical Structure and Stochastic Processes
- The messages are assumed to be generated by stochastic (or Markov) processes. This models the fact that in natural languages and many signals, successive symbols are statistically dependent.
- For an ergodic process (one where long sequences are statistically representative of the whole source), the entropy per symbol can be approached via n‑gram (Markov chain) models.
Encoding/Decoding as Transducers
- The encoding (transmitter) and decoding (receiver) operations are modeled as finite‑state transducers. If these operations are non‑singular (invertible), they preserve the source’s entropy.
- A well-designed encoder “matches” the source to the channel by assigning shorter codes to more probable symbols, thus approaching the theoretical limit of efficiency.
Implications and Engineering Trade-offs
- Achieving codes that come arbitrarily close to the channel capacity often requires longer delay (due to the complexity of grouping larger blocks of symbols) and increased computational resources.
- There is an inherent balance between the transmission rate, the error rate, and the delay introduced by the encoding and decoding processes.

In summary, Shannon’s mathematical theory of communication establishes that:

Information can be quantified as entropy, a fundamental measure of uncertainty.
The capacity of a channel (both noiseless and noisy) is determined by its ability to carry this information and is quantified in bits per second.
By proper encoding—taking into account the statistical structure of the source and the constraints (including noise) of the channel—it is possible to achieve data transmission rates arbitrarily close to the theoretical maximum, provided the source entropy does not exceed the channel capacity.

This rigorous framework underpins much of modern digital communication and data compression techniques.