Hamiltonian Monte Carlo: A Practical Tutorial

Hamiltonian Monte Carlo (HMC) is a powerful Markov Chain Monte Carlo (MCMC) method that leverages Hamiltonian dynamics to efficiently explore complex probability distributions. Unlike simpler MCMC methods like Metropolis-Hastings or Gibbs sampling, HMC uses gradient information to guide its exploration, allowing it to navigate high-dimensional spaces more effectively and reduce random walk behavior. This tutorial provides a comprehensive guide to understanding and implementing HMC, suitable for both beginners and experienced practitioners.

Understanding the Basics of Hamiltonian Monte Carlo

At its core, Hamiltonian Monte Carlo relies on the principles of Hamiltonian dynamics, a framework from classical mechanics that describes the evolution of a system in terms of its position and momentum. To understand HMC, it's essential to grasp these fundamental concepts. The Hamiltonian, denoted as H, represents the total energy of the system, which is the sum of its potential energy (U) and kinetic energy (K). In the context of Bayesian inference, the potential energy is related to the negative log-likelihood of the target distribution, while the kinetic energy is typically defined using a Gaussian distribution for the momentum variables.

Hamiltonian Dynamics: A Quick Primer

In classical mechanics, the state of a system is described by its position (q) and momentum (p). The Hamiltonian H(q, p) governs the evolution of the system over time. The equations of motion, derived from Hamilton's equations, are:

dq/dt = ∂H/∂p
dp/dt = -∂H/∂q

These equations describe how the position and momentum change over time. In HMC, we simulate these dynamics to propose new states in our Markov chain. The key idea is that by following the Hamiltonian dynamics, we can efficiently explore the target distribution, especially in high-dimensional spaces. The gradient information, ∂H/∂q, guides the trajectory, allowing us to move quickly through the space and avoid getting stuck in local modes.

Applying Hamiltonian Dynamics to Bayesian Inference

In the context of Bayesian inference, we want to sample from a target distribution π(θ), where θ represents the parameters of our model. We can define the potential energy U(θ) as the negative log of the target distribution:

U(θ) = -log π(θ)

To apply Hamiltonian dynamics, we introduce auxiliary momentum variables ρ, typically drawn from a Gaussian distribution with mean 0 and covariance matrix M:

p(ρ) = N(0, M)

The kinetic energy K(ρ) is then defined as:

K(ρ) = (1/2)ρᵀM⁻¹ρ

The Hamiltonian is the sum of the potential and kinetic energies:

H(θ, ρ) = U(θ) + K(ρ)

The Role of the Leapfrog Integrator

Since we cannot solve Hamilton's equations analytically for most real-world problems, we need to use a numerical integrator. The leapfrog integrator is a popular choice for HMC because it preserves volume and is time-reversible, properties that are crucial for maintaining the detailed balance of the Markov chain. The leapfrog integrator consists of the following steps:

Update the momentum halfway:

ρ(t + ε/2) = ρ(t) - (ε/2) ∇U(θ(t))
Update the position:

θ(t + ε) = θ(t) + ε M⁻¹ ρ(t + ε/2)
Update the momentum the remaining half:

ρ(t + ε) = ρ(t + ε/2) - (ε/2) ∇U(θ(t + ε))

Here, ε is the step size. By repeating these steps L times, we can simulate the Hamiltonian dynamics for a longer period.

Implementing Hamiltonian Monte Carlo: A Step-by-Step Guide

Now that we have a solid understanding of the theory behind HMC, let's dive into the practical aspects of implementing it. Here’s a step-by-step guide to help you get started.

Step 1: Define the Target Distribution

The first step is to define the target distribution from which you want to sample. This typically involves specifying the likelihood function and the prior distribution over the parameters. For example, consider a simple Bayesian linear regression model:

y = Xθ + ε, ε ~ N(0, σ²I)

| Read Also : Fall River Youth Basketball: Programs & Sign-Ups

θ ~ N(0, Σ)

where y is the vector of observed responses, X is the design matrix, θ is the vector of parameters, ε is the error term, σ² is the error variance, and Σ is the prior covariance matrix. The target distribution is the posterior distribution p(θ|y), which is proportional to the product of the likelihood and the prior:

p(θ|y) ∝ p(y|θ) p(θ)

Step 2: Compute the Gradient of the Potential Energy

Since HMC relies on gradient information, we need to compute the gradient of the potential energy function with respect to the parameters. In our Bayesian linear regression example, the potential energy is:

U(θ) = -log p(θ|y) = -log p(y|θ) - log p(θ)

The gradient of U(θ) is:

∇U(θ) = -∇log p(y|θ) - ∇log p(θ)

For a Gaussian likelihood and prior, these gradients can be computed analytically.

Step 3: Implement the Leapfrog Integrator

As discussed earlier, the leapfrog integrator is used to simulate the Hamiltonian dynamics. Here’s a Python implementation of the leapfrog integrator:

def leapfrog_integrator(theta, rho, grad_U, step_size, M_inv):
    rho = rho - (step_size / 2) * grad_U(theta)
    theta = theta + step_size * M_inv @ rho
    rho = rho - (step_size / 2) * grad_U(theta)
    return theta, rho

This function takes the current position (theta), momentum (rho), gradient function (grad_U), step size (step_size), and inverse mass matrix (M_inv) as inputs and returns the updated position and momentum after one leapfrog step.

Step 4: Implement the HMC Algorithm

Now we can put everything together to implement the HMC algorithm. Here’s a Python implementation:

import numpy as np

def hamiltonian_monte_carlo(U, grad_U, initial_theta, step_size, num_steps, num_samples, M_inv):
    samples = [initial_theta]
    theta = initial_theta
    
    for i in range(num_samples):
        rho = np.random.multivariate_normal(np.zeros_like(theta), np.linalg.inv(M_inv))
        current_U = U(theta)
        current_K = 0.5 * rho @ M_inv @ rho
        
        theta_new, rho_new = theta, rho
        for _ in range(num_steps):
            theta_new, rho_new = leapfrog_integrator(theta_new, rho_new, grad_U, step_size, M_inv)
        
        new_U = U(theta_new)
        new_K = 0.5 * rho_new @ M_inv @ rho_new
        
        acceptance_prob = min(1, np.exp(-(new_U + new_K - current_U - current_K)))
        if np.random.rand() < acceptance_prob:
            theta = theta_new
        
        samples.append(theta)
    
    return np.array(samples)

This function takes the potential energy function (U), gradient function (grad_U), initial position (initial_theta), step size (step_size), number of leapfrog steps (num_steps), number of samples (num_samples), and inverse mass matrix (M_inv) as inputs and returns a list of samples from the target distribution.

Step 5: Run the HMC Algorithm and Analyze the Results

Finally, we can run the HMC algorithm and analyze the results. Here’s an example of how to use the HMC implementation with the Bayesian linear regression model:

# Define the potential energy and its gradient
def U(theta):
    return -np.sum(np.log(p_y_given_theta(theta))) - np.sum(np.log(p_theta(theta)))

def grad_U(theta):
    return -grad_log_p_y_given_theta(theta) - grad_log_p_theta(theta)

# Set the parameters
initial_theta = np.zeros(X.shape[1])
step_size = 0.01
num_steps = 10
num_samples = 1000
M_inv = np.eye(X.shape[1])  # Identity matrix as the inverse mass matrix

# Run HMC
samples = hamiltonian_monte_carlo(U, grad_U, initial_theta, step_size, num_steps, num_samples, M_inv)

# Analyze the results
import matplotlib.pyplot as plt

plt.plot(samples[:, 0])
plt.xlabel("Iteration")
plt.ylabel("Theta_0")
plt.title("HMC Samples for Theta_0")
plt.show()

This code runs the HMC algorithm and plots the samples for the first parameter (Theta_0). You can analyze the samples to estimate the posterior distribution of the parameters and assess the convergence of the algorithm.

Tuning HMC for Optimal Performance

While HMC is a powerful sampling method, its performance can be sensitive to the choice of hyperparameters, such as the step size and the number of leapfrog steps. Tuning these parameters is crucial for achieving optimal performance. In this section, we'll discuss some strategies for tuning HMC.

Step Size Adaptation

The step size determines how far we move along the trajectory in each leapfrog step. If the step size is too large, the integrator may become unstable, leading to high rejection rates. If the step size is too small, the algorithm may take a long time to explore the space, resulting in slow convergence. One popular approach for adapting the step size is to use dual averaging. Dual averaging adjusts the step size based on the acceptance rate of the Metropolis-Hastings step. If the acceptance rate is too high, the step size is increased; if it is too low, the step size is decreased. This helps to find a step size that balances exploration and stability.

Number of Leapfrog Steps

The number of leapfrog steps determines the length of the trajectory. If the number of steps is too small, the algorithm may not be able to explore the space effectively. If the number of steps is too large, the algorithm may waste time exploring regions of low probability. A common strategy is to choose the number of steps such that the trajectory covers a significant portion of the space, but not so large that it becomes computationally expensive.

Mass Matrix Selection

The mass matrix M determines the kinetic energy and affects the shape of the momentum distribution. A poorly chosen mass matrix can lead to inefficient exploration of the space. A common choice is to use the identity matrix, which corresponds to independent momentum variables with unit variance. However, if the parameters have different scales or correlations, it may be beneficial to use a different mass matrix. One approach is to estimate the covariance matrix of the parameters from an initial run of HMC and use this as the mass matrix. This can help to align the momentum distribution with the shape of the target distribution, leading to more efficient exploration.

Advanced Techniques in Hamiltonian Monte Carlo

Beyond the basic implementation, several advanced techniques can further enhance the performance and applicability of HMC. Here are a few notable ones.

No-U-Turn Sampler (NUTS)

The No-U-Turn Sampler (NUTS) is an extension of HMC that automatically tunes the number of leapfrog steps. NUTS builds a binary tree of trajectories and stops expanding the tree when it starts to make a "U-turn," indicating that it is retracing its steps. This allows NUTS to adaptively choose the trajectory length, leading to more efficient exploration of the space. NUTS is particularly useful for high-dimensional problems where it is difficult to manually tune the number of leapfrog steps.

Preconditioned HMC

Preconditioned HMC uses a preconditioning matrix to transform the parameter space, making it easier for HMC to explore. The preconditioning matrix is typically chosen to be an approximation of the inverse covariance matrix of the target distribution. This can help to reduce the correlations between parameters and improve the efficiency of HMC. Preconditioned HMC is particularly useful for problems with strong correlations between parameters.

Riemannian Manifold HMC

Riemannian Manifold HMC (RM-HMC) extends HMC to non-Euclidean spaces by incorporating the geometry of the target distribution. RM-HMC uses the Riemannian metric tensor to define the kinetic energy and the leapfrog integrator. This allows RM-HMC to adapt to the local geometry of the target distribution, leading to more efficient exploration. RM-HMC is particularly useful for problems where the target distribution has a complex geometry.

Conclusion

Hamiltonian Monte Carlo is a powerful tool for Bayesian inference that can efficiently explore complex probability distributions. By leveraging Hamiltonian dynamics and gradient information, HMC overcomes the limitations of simpler MCMC methods and provides accurate and reliable samples from the target distribution. This tutorial has provided a comprehensive guide to understanding and implementing HMC, covering the theoretical foundations, practical implementation, tuning strategies, and advanced techniques. Whether you are a beginner or an experienced practitioner, HMC can be a valuable addition to your toolkit for Bayesian inference. By understanding the principles behind HMC and mastering the techniques for tuning and extending it, you can unlock its full potential and tackle a wide range of challenging problems.