Bayesian Optimization

Overview

Black-box optimization with a probabilistic surrogate (typically a Gaussian process). Balances exploration vs exploitation via acquisition functions while connecting to bandit intuition and RL planning.

Figure 1: Contour plots of UCB (with \(\beta_t=0.75\)), PI, and EI as functions of \(\Delta_t\) and \(\sigma_t\); brighter colors mark larger acquisition values.

As shown in Figure 1, PI, EI, and UCB respectively focus exploitation vs exploration for a 2D Branin-like surface.

Problem setup

  • Unknown objective \(f:\mathcal{X}\to\mathbb{R}\); observations \(y_t = f(x_t) + \varepsilon_t\), \(\varepsilon_t\sim\mathcal{N}(0,\sigma_n^2)\).
  • Surrogate: GP posterior \((\mu_t(x),\sigma_t^2(x))\) after \(t\) queries.
  • Acquisition \(a_t(x)\) selects next query \(x_{t+1}=\arg\max_x a_t(x)\).
  • Trade-off: exploration (high \(\sigma_t\)) vs exploitation (high \(\mu_t\)).

Upper Confidence Bound (UCB)

  • Optimism in face of uncertainty (OFU): \[x_{t+1}=\arg\max_x \mu_t(x)+\beta_{t+1}^{1/2}\,\sigma_t(x).\]
  • \(\beta_t\) controls confidence width; choose schedule to guarantee \(f(x)\in[\mu_t(x)\pm\beta_t^{1/2}\sigma_t(x)]\) with high prob.
  • Bayesian setting: \(\beta_t = 2\log\frac{|\mathcal{X}|\pi^2 t^2}{6\delta}\) (finite domain). Frequentist: \(\beta_t= B + 4\sigma_n\sqrt{\gamma_{t-1}+1+\ln(1/\delta)}\) when \(f\) bounded in RKHS (Theorem 9.3).
  • Regret bound \(\mathcal{O}(\sqrt{T\beta_T\gamma_T})\) where \(\gamma_T\) is information gain of kernel (RBF: \(\gamma_T=\tilde{\mathcal{O}}((\log T)^{d+1})\)).
Figure 2: Upper-confidence bound acquisition scores for \(\beta=0.25\) (left) and \(\beta=1\) (right).

As Figure 2 illustrates, confidence tubes widen on sparse data, making the optimism term explicit.

Improvement-based acquisition

  • Maintain incumbent best \(\hat{f}_t=\max_{\tau\le t} y_\tau\).
  • Probability of improvement (PI): \[a_t^{\mathrm{PI}}(x)=\Phi\Big(\frac{\mu_t(x)-\hat{f}_t-\xi}{\sigma_t(x)}\Big)\] with exploration parameter \(\xi\ge0\) (larger \(\xi\) encourages exploration).
  • Expected improvement (EI): \[a_t^{\mathrm{EI}}(x) = (\mu_t(x)-\hat{f}_t-\xi)\Phi(z) + \sigma_t(x)\phi(z),\quad z=\frac{\mu_t(x)-\hat{f}_t-\xi}{\sigma_t(x)}.\] Closed form for Gaussian surrogate; often optimize \(\log\text{EI}\) to avoid flatness.
  • Both rely on Gaussian predictive distribution; for non-Gaussian surrogates use quadrature or MC.
Figure 3: Probability of improvement (PI) and expected improvement (EI) acquisition functions.

As Figure 3 shows, PI saturates once \(\mu_t\) exceeds the incumbent, whereas EI still values high-variance points — the ridge over \(\hat{f}_t\) remains.

Thompson sampling / probability matching

  • Sample function \(\tilde{f}\sim\mathcal{GP}(\mu_t,k_t)\) and pick \(x_{t+1}=\arg\max_x \tilde{f}(x)\).
  • Equivalent to sampling from probability of optimality distribution \(\pi(x)=\Pr[\tilde{f}(x)=\max_x \tilde{f}(x)]\).
  • Implement via random Fourier features or basis projection to draw approximate GP sample.

Exploratory design connections

  • MI view: \(\Delta_I(x)=\tfrac{1}{2}\log(1+\sigma_t^{-2}(x)\sigma^2_{\varepsilon})\) for GP ⇒ UCB/EI can be seen as MI proxies.
  • Knowledge gradient / information-directed sampling combine expected reward improvement with information gain.
  • Acquisition trade-offs often tuned by schedule \(\beta_t\), \(\xi_t\), or entropy regularization.

Optimization of acquisition

  • Non-convex; in low \(d\) use dense grids + local optimization, in higher \(d\) use multi-start gradient ascent, evolutionary strategies, or sample-evaluate heuristics.
  • Warm-start new acquisition maximization from previous maximizers.
  • Constraints: handle by penalizing infeasible points or modeling constraint GP.

Practicalities

  • Hyperparameters: re-optimize GP kernel parameters via marginal likelihood after each few observations; rely on priors to avoid overfitting small data.
  • Noise estimation: inaccurate noise variance breaks UCB/EI calibration.
  • Batch BO: choose multiple points per round via fantasizing or \(q\)-EI/TS.
  • Termination: when acquisition near zero or budget exhausted.