A Complete Recipe for Stochastic Gradient MCMC

1. Summary

In this paper, a complete framework to direct the MCMC samplers has been presented. It shows that this framework is sufficient and necessary for all MCMC samplers. As many recent work try to use stochastic gradient version MCMC samplers as a consideration of computation, this method mainly introduce a systematical way to guide SGMCMC. Based on this framework, a new MCMC sampler called stochastic gradient Riemann Hamiltonian Monte Carlo (SGRHMC) has also been introduced

2. A Stochastic Gradient MCMC Framework

In general, we can write all continuous Markov processes as the following form stochastic differential equation:

where denotes the deterministic drift and is a -dimensional Wiener Process, and is a positive semi-definite diffusion matrix. Notice that only some choices of and yields a stationary distribution with the form . Here is related to Hamiltonian MC as well as other MCMC algorithms.

This paper proposes to write into the following form:

Here is a skew-symmetric curl matrix representing the deterministic traversing effects seen in HMC procedures. In contrast, the diffusion matrix determines the strength of the Wiener-process-driven diffusion. Matrices and can be adjusted to attain faster convergence to the posterior distribution.

It is stated that if we specifically write into the above form, then the stationary distribution of SDE has the form of . Moreover, if the stationary distribution of SDE uniquely exists, then there exists skew-symmetric curl matrix such that we can write in the SDE to our specified form, and therefore the stationary distribution has our desired form.

3. Practical Algorithms and Discussions

3.1. \varepsilon-discretization Update

In practice, we choose a step size of and update using the following discretized form of SDE:

In the above step, instead of evaluating the gradient using full data, we can use part of the data to do a stochastic-gradient-descent-style update in order to achieve computation efficiency. Although this SG step is unbiased, it does introduce extra noise into the training procedure. It can be shown that this noise is asymptotically normal, and we can partly subtract this noise by

where is the down-sampled version of gradient and is our estimated noise term. It is also true that this SGD-correction training procedure also yields the correct invariant distribution.

3.2. Construct Samplers with Proposed Framework

The authors shows that all previous MCMC methods, e.g. Hamiltonian Monte Carlo (HMC), Stochastic Gradient Hamiltonian Monte Carlo (SGHMC), Stochastic Gradient Langevin Dynamics (SGLD) as well as Stochastic Gradient Riemannian Langevin Dynamics (SGRLD) are all adapted to this framework.

Since the matrices and can be adjusted, a new sampler called Stochastic Gradient Riemann Hamiltonian Monte Carlo (SGRHMC) has been presented which introduces Fisher information metric as a way to utilize the geometry information of our targeted distribution. For a detailed discussion on how to use geometry information in the MCMC procedure, please refer to Girolami, M., & Calderhead, B. (2011). Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123-214..

results matching ""

    No results matching ""