Part 2 - Week 4

Mrinmaya Sachan
Published

Tuesday, April 22, 2025

Retrieval-Augmented Language Models (RALMs)

Motivation

  • Limitations of LMs as factual databases:
    • Parametric LMs store knowledge implicitly in weights — difficult to inspect, update, or guarantee correctness.
    • They can hallucinate facts, especially for rare or long-tail knowledge.
    • Updating requires retraining or fine-tuning, which is costly and may cause catastrophic forgetting.
  • External Knowledge Bases (KBs):
    • Structured (e.g., Wikidata, WordNet) or unstructured (e.g., Wikipedia, news archives).
    • Queried at inference time to ground LM outputs in verifiable evidence.
    • Benefits:
      • Improves factual accuracy and trustworthiness.
      • Enables citing sources.
      • Easier to update and control content.
      • Reduces risk of leaking private training data.

LAMA Probe — Language Model Analysis

  • Purpose: Evaluate factual and commonsense knowledge encoded in LMs.
  • Method:
    • Construct cloze-style prompts from KB triples (head entity, relation, tail entity).
      • Example: (France, capital, Paris) → “The capital of France is [MASK].”
    • Use datasets from sources like Wikidata, Google-RE, T-REx, SQuAD.
  • Findings:
    • LMs can recall some facts but performance varies by relation type and frequency in training data.
  • Implication: Motivates augmenting LMs with retrieval to improve factuality and updateability.

Knowledge-Enhanced Language Models

Parametric Knowledge Integration

  • Definition: Knowledge is stored in model parameters after training/fine-tuning.
  • Approaches:
    • Entity-aware embeddings:
      • KnowBERT: Integrates entity embeddings from KBs (WordNet, Wikipedia) into BERT.
        • Uses entity linking to map tokens to KB entities.
        • Knowledge attention + recontextualization layer fuses entity and token embeddings.
        • Improves perplexity, recall, and downstream task performance.
      • ERNIE: Similar integration of entity and fact embeddings.
    • Intermediate memory layers:
      • KGLM: Augments LM with a latent variable for entities, conditioning generation on KB facts.
      • kNN-LM: At inference, retrieves nearest neighbor hidden states from a datastore built over training data.
    • Entity-marked pretraining:
      • WKLM: Marks entity mentions in text and replaces some with other entities to encourage entity discrimination.
  • Limitations:
    • Updating knowledge requires retraining.
    • KB coverage and entity linking errors can limit performance.

Non-Parametric Knowledge Integration

  • Definition: Knowledge is retrieved from an external source at inference time.
  • Advantages:
    • Smaller LMs + retrieval can outperform larger LMs on knowledge-intensive tasks.
    • Easy to update KB without retraining LM.
    • Supports explicit citations and content control.

Retrieval Components

Retriever

  • Sparse retrieval:
    • TF–IDF:
      • Represents documents as sparse vectors of term weights.
      • Weight: \(\text{tf-idf}(t, d) = \text{tf}(t, d) \cdot \log\frac{N}{\text{df}(t)}\).
      • Score: \(score(q, d) = \sum_{t \in q} \frac{\text{tf-idf}(t, d)}{|d|}\).
      • Efficient via inverted index; works well when query–doc term overlap is high.
      • Limitations: Ignores semantics beyond exact matches; sensitive to stopwords and morphology.
  • Dense retrieval:
    • Encodes queries and documents into dense vectors in a shared embedding space.
    • Similarity via dot product or cosine similarity.
    • Dense Passage Retrieval (DPR):
      • Dual-encoder: separate encoders for questions and passages.
      • Trained with contrastive loss to bring matching Q–P pairs closer and push non-matching apart.
      • Enables semantic matching beyond lexical overlap.

Reader / Generator

  • Consumes query + retrieved docs to produce answer.
  • Can be:
    • Extractive: Selects answer span from retrieved text.
    • Abstractive: Generates answer conditioned on retrieved evidence.

Fusion of Retrieved Knowledge

Interpolation — kNN-LM

  • Build a datastore of \((\text{key}, \text{value})\) pairs from training set hidden states.
  • At inference:
    • Retrieve top-\(k\) nearest keys to current hidden state.
    • Form probability distribution over next tokens from retrieved values.
    • Interpolate with LM’s own distribution: \[ p_{\text{final}} = \lambda p_{\text{kNN}} + (1 - \lambda) p_{\text{LM}} \]
  • Pros: Improves rare word prediction.
  • Cons: High memory and compute cost for nearest neighbor search.

Concatenation — REALM

  • Treat retrieval as latent variable \(z\): \[ p(y|x) = \sum_{z} p(y|x, z) \, p(z|x) \]
  • Components:
    • Neural retriever \(p(z|x)\).
    • Knowledge-augmented encoder \(p(y|x, z)\).
  • Pretraining:
    • Masked language modeling with retrieval.
    • Retriever and encoder trained jointly.
    • Index updated periodically; sum over \(z\) approximated with top-\(k\) retrieved docs.

Cross-Attention — RETRO

  • Retrieve \(k\) nearest chunks for each input segment.
  • At intermediate Transformer layers, use cross-attention to attend to retrieved chunk embeddings.
  • Benefits:
    • Scales to large corpora without increasing parametric memory.
    • Achieves strong performance with fewer parameters than comparable LMs.

Open Challenges

  • No consensus on optimal retriever–reader integration (early vs late fusion, cross-attention vs concatenation).
  • Multi-step retrieval needed for complex reasoning and multi-hop QA.
  • Dense retrievers require large, high-quality training data; domain adaptation remains challenging.
  • Retrieval adds inference overhead; efficient ANN search and caching are active research areas.
  • Need better benchmarks for factuality, attribution, and reasoning in RALMs.