Importance sampling ⚖️⚖️ : The Jarzynski connection
March 2022
Let Vt be a family of smooth potentials on Rn, continuously evolving with time t∈[0,1]. Let pt be their associated Gibbs measure:
pt(x)=Zte−Vt(x)whereZt=∫Rne−Vt(x)dx.
It is not easy to find a continuous dynamic Xt, such that at each time t the distribution of Xt matches pt. One might be tempted to use the overderdamped Langevin dynamics,
dXt=−∇Vt(Xt)dt+2dWt
with (Wt) a Brownian motion. But the density ρt of Xtis not equal to pt in general. Indeed, ρt solves the Fokker-Planck equation
ρ˙t=Δρt−∇⋅[ρt∇Vt].
If ρt was equal to pt, then pt should also satisfy this equation; but for pt one easily checks that the RHS is zero, resulting in p˙t=0 which is false in general if we choose a family of evolving potentials; it would only be true at equilibrium when Vt is constant.
The Jarczynski trick allows to reconnect the distribution of Xt with pt. It introduces a weight, the solution of an auxiliary ODE, such that its conditional expectation given Xt=x gives pt(x), hence correcting the mismatch between ρt and pt.
(In the sequel we note Ft=logZt, and the dot in V˙t or F˙t always represents a derivative with respect to time. )
Let (Xt,Wt) be the solution of the following augmented system: dXt=−∇Vt(Xt)dt+2dWtdWt=(−V˙t(Xt)−Ft˙)WtdtX0∼p0W0=1. Then, for any test function φ, E[φ(Xt)Wt]=∫φ(x)Zte−Vt(x)dx. In other words, the probability measure P(A)=E[1AWt] is exactly pt.
Jarczynski's goal was to compute, at least numerically, the partition functions Zt or equivalently the free energy Ft=−logZt. The Jarczynski estimator, discovered in 1996, simply consists in applying (5) to φ=1, and noting that Wt is a simple ODE solved by
giving rise to a natural estimator for the free energy differenceFt−F0: one samples many paths from the dynamics (2), then computes the path integral wt=∫0tV˙s(Xs)ds along each path, and then average the values of e−wt and take the log.
In general, exponential of paths integrals are not easy to study. The classical way to prove Jarczynski's identity directly is to use the Feynman-Kac representation of PDEs, but this proof seems quite involved; instead, the trick above with the augmented system seems to be easier. It was shown to me by Eric Vanden-Eijnden.
Equation (5) can be reformulated as follows: if we note ϱt(x,w) the density of (Xt,Wt), then
pt(x)=∫0∞wϱt(x,w)dw.
Note that the integral is only on the positive axis because Wt is always positive from (6). We note qt(x) the term on the right; we are going to prove pt=qt by proving that both satisfy the same parabolic equation, then invoke a uniqueness result.
From now on, we'll set Ut=Vt−Ft, so that pt(x)=e−Ut(x) is normalized. Clearly, ∇Ut=∇Vt.
We consider the system Zt=(Xt,Wt) in (4) as a single SDE
dZt=ft(Zt)dt+σdWt
with
ft(x,w)=(−∇Ut(x)−U˙t(x)w) and σ=diag(2,…,2,0).
Let us note ϱt(x,y) the density of (Xt,Wt), and write the associated Fokker-Planck equation; here and after, ∇ is a gradient and ∇⋅ is the divergence. For clarity in this section, subscripts denote partial differentiation with respect to the corresponding variables.
We already set qt(x)=∫0∞wϱt(x,w)dw, so that E[Wt]=∫qt(x)dx for example. Multiplying the last equation by w, integrating, and swapping integrals and derivatives, we get
An integration by parts shows that ∫0∞w2∇wϱt(x,w)dw=−2∫0∞wϱt(x,w)dw=−2qt(x), provided a sufficient decay of w→w2ϱ(x,w) at infinity. We obtain the following equation:
where I removed the subscript for clarity. This equation is a Fokker-Planck equation (first terms) but with a birth-death term added at the end.
It turns out that the density pt(x)=e−Ut(x) is a solution of this equation. Indeed, p˙t=−U˙tpt and ∇pt=−∇Utpt, so that it is easy to check that (16) is satisfied.
We're now left with checking that the solutions of
u˙t=Δut−∇⋅[utBt]+Ctut
are essentially unique if they start with the same initial conditions, which is the case here. In the equation Bt=∇Ut and Ct=−U˙t. Uniqueness for solutions of parabolic equations is well-known to PDE theorists and can be found in Evans' book, chapter 7. We provide a proof for completeness. Indeed, by linearity, it is sufficient to check that the only solution of (17) started at u0(x)=0 is identically zero. Set γ(t):=∫∣ut(x)∣2dx; we're going to bound its derivative using "energy estimates". We have
γ˙(t)=∫utΔut−ut∇⋅[utBt]+∣ut∣2Ct.
The first term is −∫∣∇ut∣2. The second term is equal to +∫∇ut⋅utBt by integration by parts. Young's inequality a⋅b⩽λ∣a∣2+∣b∣2/4λ, valid for any λ>0, yields
∫∇ut⋅utFt⩽λ∫∣∇ut∣2+4λ1∫∣ut∣2∣Bt∣2.
In particular with λ=1 we cancel the first term and we obtain
γ˙(t)⩽4λ1∫∣ut∣2(∣Bt∣2+Ct).
Now, if Bt,Ct are bounded by some constant c, we get γ˙(t)⩽cγ(t). Grönwall's lemma yields γ(t)⩽γ(0)ect=0 and since γ(t) is always positive, we get γ(t)=0 and ut=0 for all t.