Gaussian conditioning

September 2023

Let XX be a Gaussian vector over Rn\mathbb{R}^n with mean μ\mu and covariance matrix Σ\Sigma. Split XX in two bits, say X=(X1,X2)X = (X_1, X_2) with respective sizes n1,n2n_1, n_2 (here n1+n2=nn_1 + n_2 = n). What is the conditional distribution of X1X_1 given X2X_2? First, let us split the mean and covariance of XX into the corresponding blocks:

μ=[μ1μ2]Σ=[Σ1,1Σ1,2Σ2,1Σ2,2]\begin{aligned}\mu = \begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}&&&&\Sigma = \begin{bmatrix} \Sigma_{1,1}&\Sigma_{1,2}\\ \Sigma_{2,1} & \Sigma_{2,2}\end{bmatrix}\end{aligned}

so that for example X1X_1 is a Gaussian with mean μ1\mu_1 and covariance Σ1,1\Sigma_{1,1}. Obviously, since Σ\Sigma is symmetric, Σ2,1=Σ1,2\Sigma_{2,1} = \Sigma_{1,2}^\top.

Theorem. The distribution of X1X_1 conditionally on X2X_2 is a Gaussian random variable with mean m=μ1+Σ1,2Σ2,21(X2μ2) m=\mu_1 + \Sigma_{1,2}\Sigma_{2,2}^{-1}(X_2 - \mu_2) and with covariance S=Σ1,1Σ1,2Σ2,21Σ1,2. S=\Sigma_{1,1} - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{1,2}^\top .

Proof

By block-inversion

Let f(x,y)f(x,y) be the joint density for (X1,X2)(X_1,X_2), namely

f(x,y)=exp(12z,Σ1z)(2π)n/2detΣwherez=(x,y). f(x,y) = \frac{\exp\left(- \frac{1}{2} \langle z, \Sigma^{-1} z\rangle \right)}{(2\pi)^{n/2}\sqrt{\det \Sigma}} \quad \text{where} \quad z = (x,y)^\top.

It is well known that the conditional distribution of X1X_1 given X2=yX_2=y is

f(xy)=f(x,y)f(x,y)dx.f(x\mid y)= \frac{f(x,y)}{\int f(x,y)dx}.

We could perform this exact computation and find the claim in the theorem but. To proceed, we need to find the expression of the inverse of Σ\Sigma. That is doable, and indeed the famous Schur formulas tell us that

Σ1=[S1S1Σ1,2Σ2,21Σ2,21Σ2,1S1Σ2,21+Σ2,21Σ2,1S1Σ1,2Σ2,21] \Sigma^{-1} = \begin{bmatrix}S^{-1} & - S^{-1}\Sigma_{1,2}\Sigma_{2,2}^{-1} \\ -\Sigma_{2,2}^{-1}\Sigma_{2,1}S^{-1} & \Sigma_{2,2}^{-1} + \Sigma_{2,2}^{-1}\Sigma_{2,1} S^{-1}\Sigma_{1,2}\Sigma_{2,2}^{-1}\end{bmatrix}

where SS is called the Schur complement of the first block of Σ\Sigma,

S=Σ1,1Σ1,2Σ2,21Σ2,1. S = \Sigma_{1,1} - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,1}.

We immediately recognize (3). By carefully reorganizing the terms inside f(x,y)f(x,y) we would readily find that f(xy)f(x\mid y) is proportional to

exp(12xm,S1(xm)) \exp\left( - \frac{1}{2}\langle x-m, S^{-1}(x-m)\rangle\right)

hence the theorem would be proved.

I find this method computational and I never remember the block-inversion formula (6).

Instead, there is a simpler, more conceptual path: observe that logf(x,y)\log f(x,y) is a quadratic function in (x,y)(x,y), hence when yy is fixed, logf(x,y)\log f(x,y) is still a quadratic function in xx. But obviously, log-quadratic probability densities are precisely Gaussian densities. We just proved that

the conditional distribution of a Gaussian vector remains Gaussian.\text{the conditional distribution of a Gaussian vector remains Gaussian.}

Hence, all we have to do is to compute the conditional mean and the conditional variance, namely

E[X1X2]andVar(X1X2)=E[(X1E[X1X2])(X1E[X1X2])X2].\begin{aligned}&\mathbb{E}[X_1 \mid X_2] &\quad\text{and}\quad& \mathrm{Var}(X_1\mid X_2) = \mathbb{E}[(X_1 -\mathbb{E}[X_1 \mid X_2] )(X_1 -\mathbb{E}[X_1 \mid X_2] )^\top \mid X_2] . \end{aligned}

Decorrelating X1X_1 and X2X_2

To compute (10), there is a clever trick. The idea is to remove the part of X1X_1 wich depends on X2X_2, to get something independent of X2X_2. Indeed, we want to find a matrix MM such that Z=X1+MX2Z=X_1 + MX_2 is independent of X2X_2. Since Z,X2Z,X_2 are jointly Gaussian, they only need to be decorrelated, that is E[ZX2]=0\mathbb{E}[ZX_2^\top]=0 which translates into E[X1X2]+ME[X2X2]=0\mathbb{E}[X_1 X_2^\top] + M \mathbb{E}[X_2X_2^\top]=0, hence

M=Σ1,2Σ2,21M = - \Sigma_{1,2}\Sigma_{2,2}^{-1}

and for future reference,

Z=X1Σ1,2Σ2,21X2X1=Z+Σ1,2Σ2,21X2.\begin{aligned}&Z = X_1 - \Sigma_{1,2}\Sigma_{2,2}^{-1} X_2 &&& X_1 = Z + \Sigma_{1,2}\Sigma_{2,2}^{-1}X_2.\end{aligned}

Conditional mean

Now, we can compute the conditional mean:

E[X1X2]=E[ZX2]+Σ1,2Σ2,21E[X2X2]=E[Z]+Σ1,2Σ2,21X2=E[X1]Σ1,2Σ2,21E[X2]+Σ1,2Σ2,21X2=μ1Σ1,2Σ2,21μ2+Σ1,2Σ2,21X2=μ1+Σ1,2Σ2,21(X2μ2).\begin{aligned}\mathbb{E}[X_1\mid X_2] &= \mathbb{E}[Z \mid X_2] + \Sigma_{1,2}\Sigma_{2,2}^{-1}\mathbb{E}[X_2 \mid X_2] \\&= \mathbb{E}[Z] + \Sigma_{1,2}\Sigma_{2,2}^{-1} X_2\\ &= \mathbb{E}[X_1] - \Sigma_{1,2}\Sigma_{2,2}^{-1}\mathbb{E}[X_2] + \Sigma_{1,2}\Sigma_{2,2}^{-1}\mid X_2\\ &= \mu_1 - \Sigma_{1,2}\Sigma_{2,2}^{-1}\mu_2 + \Sigma_{1,2}\Sigma_{2,2}^{-1}X_2 \\ &= \mu_1 + \Sigma_{1,2}\Sigma_{2,2}^{-1}(X_2 - \mu_2). \end{aligned}

Conditional variance

For the conditional variance, we note that

X1E[X1X2]=ZMX2(E[Z]MX2)=ZE[Z] X_1 -\mathbb{E}[X_1 \mid X_2] = Z - MX_2 - (\mathbb{E}[Z] - MX_2) = Z-\mathbb{E}[Z]

hence X1E[X1X2]X_1 -\mathbb{E}[X_1 \mid X_2] is independent of X2X_2, and in particular Var(X1X2)=Var(Z)\mathrm{Var}(X_1 \mid X_2) = \mathrm{Var}(Z) and

Var(Z)=Var(X1)+MVar(X2)M+Cov(X1,MX2)+Cov(MX2,X1)=Σ1,1+Σ1,2Σ2,21Σ2,2(Σ1,2Σ2,21)+Σ1,2M+MΣ2,1=Σ1,1+Σ1,2Σ2,21Σ1,2Σ1,2Σ2,21Σ1,2Σ1,2Σ2,21Σ2,1=Σ1,1+Σ1,2Σ2,21Σ2,1Σ1,2Σ2,21Σ2,1Σ1,2Σ2,21Σ2,1=Σ1,1Σ1,2Σ2,21Σ1,2.\begin{aligned} \mathrm{Var}(Z)&= \mathrm{Var}(X_1) + M\mathrm{Var}(X_2)M^\top + \mathrm{Cov}(X_1, MX_2) + \mathrm{Cov}(MX_2, X_1) \\ &= \Sigma_{1,1} + \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,2}(\Sigma_{1,2}\Sigma_{2,2}^{-1})^\top + \Sigma_{1,2}M^\top + M\Sigma_{2,1}\\ &= \Sigma_{1,1} + \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{1,2}^\top - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{1,2}^\top - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,1}\\ &= \Sigma_{1,1} + \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,1} - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,1} - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{2,1}\\ &= \Sigma_{1,1} - \Sigma_{1,2}\Sigma_{2,2}^{-1}\Sigma_{1,2}^\top . \end{aligned}