🏋🏼 Heavy tails III: Kesten's theorem

November 2023

Motivation: ARCH models

In 2003, Robert F. Engle won the Nobel prize in economy for his innovative methods in time-series analysis; to put it sharply, Engle introduced ARCH models into economics. In his seminal 1982 paper (30k+ citations), he wanted to model the time-series (yt)(y_t) representing the inflation in the UK. Most previous models used a simple autoregressive model of the form yt+1=αyt+εty_{t+1} = \alpha y_t + \varepsilon_t, where εt\varepsilon_t is an external Gaussian noise with variance σ2\sigma^2 and α<1\alpha<1 a "forgetting" parameter. The problem with these models is that the variance of the inflation at time t+1t+1 knowing the inflation at time tt, that is Var(yt+1yt)\mathrm{Var}(y_{t+1}|y_t), is simply the variance of Var(εt)=σ2\mathrm{Var}(\varepsilon_t)=\sigma^2, which does not depend on yty_t.

Engle wanted a model where the conditional variance would depend on yty_t: there are good reasons to think that volatility is sticky. The model he came up with (equations (1)-(3) in his paper) is simply

yt+1=εt×α+βyt2.\begin{aligned} &y_{t+1} = \varepsilon_t \times \sqrt{\alpha + \beta y_t^2} . \end{aligned}

In other words, the variance of yt+1y_{t+1} given yty_t is α+βyt2\alpha + \beta y_t^2. This is the simplest of Auto-Regressive Conditionally Heteroscedastic (ARCH) models. Upon squaring everything in (1), the equation becomes

yt+12=αεt2+βεt2yt2.y_{t+1}^2 = \alpha \varepsilon^2_t + \beta \varepsilon_t^2 y_t^2.

This is a linear recursion in yt2y_t^2. In the paper, Engle introduced a few variations and crafted statistical methods to estimate the parameters and their significance.

A central question in this way of modelling things is: how does yny_n behave in the long term? Does yny_n stay stable, can it take extremely large values (crises, shocks and crashes), and if so, at which frequency?

If yt2y_t^2 converges in distribution towards a random variable YY, then (2) shows that YY and αε2+βε2Y\alpha \varepsilon^2 + \beta \varepsilon^2 Y must have the same probability distribution, where ε\varepsilon is an N(0,σ2)\mathscr{N}(0,\sigma^2), independent of YY. This is an instance of a very general kind of equations, called affine distributional equations: they are equations of the form

Y=lawAY+BY \stackrel{\mathrm{law}}{=} AY+B

where A,BA,B are random variables independent of YY. It turns out that these equations are generally impossible to solve. However, a theorem of Harry Kesten states, perhaps not so intuitively, that the law of any solution must have a heavy tail: in contrast with, say, Gaussian distributions, for which taking extremely large values (« shocks ») has an exponentially small probability, heavy-tailed distributions can take large values with polynomially small probability, which is… not so rare at all!

Here is a small simulation over 10610^6 periods of time and parameters α=0.1,β=1,σ=0.1\alpha=0.1, \beta=1, \sigma = 0.1. The histogram indicates that y(t)y(t) seems to have heavy tails, that is, the probability of observing a unusually large value is polynomially small (and not exponentially small).

This is one of Kesten's most famous results, published in 1973 in Acta Mathematica.

Kesten's theorems

From now on, we'll study the solutions of the equation

X=lawAX+B X \stackrel{\mathrm{law}}{=} AX+B

where XX is a random variable over Rd\mathbb{R}^d, AA is a random matrix and BB a random vector, both independent of XX. The goal is to present Kesten's theorem, which states that on certain conditions on AA and BB, the solution XX must have heavy tails; that is, P(X>x)\mathbb{P}(X>x) decreases polynomially fast, and not exponentially fast as for Gaussian variables. This result is somehow mind-blowing in its precision, since we have access to the exact tail exponent.

A full, rigorous proof would be packed with technicalities; I'll only show a simili-proof in dimension 11, which, although not being complete, was sufficiently clear and idea-driven to convince me that Kesten's result holds. I'm also including a simili-proof of the Renewal theorem, which might be of independent interest.

One-dimensional case

For simplicity, we restrict to the case where A,BA,B are independant and have a density with respect to the Lebesgue measure. In general, it is not obvious that solutions to (4) exist, but a former result by Kesten states that it is the case if E[lnA]<\mathbb{E}[\ln |A|]<\infty.

Kesten's theorem (1973).

(i) Assume that A>0A>0 almost surely and that there is an s>0s>0 such that E[As]=1,E[As(lnA)+]<,E[Bs]<1.\begin{aligned}&\mathbb{E}[|A|^s]=1, &&\mathbb{E}[|A|^s (\ln |A|)_+]<\infty, &&\mathbb{E}[|B|^s]<1.\end{aligned} Then, there are two constants c±c_\pm such that when xx\to \infty, P(X>x)c+xs,P(X<x)cxs.\begin{aligned}&\mathbb{P}(X>x)\sim c_+ x^{-s},&&&\mathbb{P}(X<-x) \sim c_- x^{-s}.\end{aligned} (ii) The same result holds if AA can take positive and negative values, and in this case c+=cc_+ = c_-.

Sketch of proof in the 1d case

I'll only sketch the proof ideas in the subcase of (i) where in addition, BB is nonnegative. In this case, we can safely assume that XX is nonnegative by conditionning over the set {X>0}\{X>0\}. We set f(x)=esxP(X>ex)f(x) = e^{sx}\mathbb{P}(X > e^x); our final goal is to prove that f(x)f(x) converges towards some constant when x+x\to+\infty, the -\infty case being identical.

The recursion (4) shows that f(x)=esxP(AX+B>ex)f(x) = e^{sx}\mathbb{P}(AX + B > e^x). However, if xx is very large, we could guess that P(AX+B>ex)\mathbb{P}(AX + B > e^x) is close to P(AX>ex)\mathbb{P}(AX > e^x). This is the origin of the first trick of the proof, which is to artificially write

f(x)=esx(P(AX+B>ex)P(AX>ex))+esxP(AX>ex).\begin{aligned} f(x) &= e^{sx}(\mathbb{P}(AX+B>e^x) - \mathbb{P}(AX>e^x)) + e^{sx}\mathbb{P}(AX>e^x). \end{aligned}

Let us note gg the first term on the right:

g(x)=esx(P(AX+B>ex)P(AX>ex)).g(x)=e^{sx}(\mathbb{P}(AX+B>e^x) - \mathbb{P}(AX>e^x)).

For the second term, we can express it in terms of ff by using E[As]=1\mathbb{E}[A^s]=1:

esxP(X>exlnA)=E[Ases(xln(A))1X>exlnA].\begin{aligned} e^{sx}\mathbb{P}(X>e^{x - \ln A}) &= \mathbb{E}[A^s e^{s(x-\ln(A))}\mathbf{1}_{X>e^{x - \ln A}}]. \end{aligned}

Now, here comes the second trick: the change of measure. We assumed that E[As]=1\mathbb{E}[A^s]=1, hence we can define a probability distribution by Es[φ]=E[Asφ]\mathbb{E}_s[\varphi] = \mathbb{E}[A^s \varphi], and the expression above becomes Es[f(xlnA)]\mathbb{E}_s[f(x- \ln A)]. Overall, we obtain that ff is a solution of the following equation: 

x0,f(x)=g(x)+Es[f(xlnA)].\forall x \geqslant 0, \qquad f(x) = g(x) + \mathbb{E}_s[f(x - \ln A)].

If μs\mu_s is the density of logA\log A under the measure Ps\mathbb{P}_s, this equation becomes f(x)=g(x)+fμ(x)f(x) = g(x) + f\star \mu(x) where \star denotes the convolution operator. Such equations are called convolution equations and they can be studied using classical probability theory. The main result of Renewal Theory, exposed later below, shows that

then there is only one solution to this equation, and more crucially it satisfies

limxf(x)=0g(u)duEs[lnA]=0g(u)duE[AslnA]=:c.\lim_{x\to \infty}f(x) = \frac{\int_0^\infty g(u)du}{\mathbb{E}_s[\ln A]} = \frac{\int_0^\infty g(u)du}{\mathbb{E}[A^s \ln A]}=:c.

If E[AslnA]>0\mathbb{E}[A^s \ln A]>0, as we assumed, then c>0c>0, because gg is a nonzero positive function so its integral is nonzero. Consequently,

limxesxP(X>ex)=c. \lim_{x\to \infty}e^{sx}\mathbb{P}(X>e^x) = c.

This is equivalent to P(X>x)cxs\mathbb{P}(X>x) \sim cx^{-s}, as requested. There is, however, a serious catch: to apply the main Renewal theorem we need to check the two conditions listed above. The second one is nothing but an assumption. However, the first one needs gg to be "directly Riemann integrable", which is indeed very difficult to check. I won't do this part at all[1].

Multi-dimensional case

In the multi-dimensional case, a similar theorem holds. One can find in the litterature a host of different settings regarding the random matrix AA and the random vector BB; for simplicity I'll stick to the case where AA is positive definite almost surely, BB has all entries nonnegative almost surely, and A,BA,B have densities with respect to the Lebesgue measure. The multi-dimensional analogue of E[As]\mathbb{E}[|A|^s] will now be

h(a)=limnE[lnA1×...×Ana]1n\begin{aligned}h(a) = \lim_{n\to\infty}\mathbb{E}[\ln \Vert A_1 \times ... \times A_n \Vert^a]^\frac{1}{n}\end{aligned}

where M\Vert M \Vert is the operator norm. We also introduce the Lyapounov exponent, which is the equivalent of E[lnA]\mathbb{E}[\ln |A|]:

γ=limn1nE[lnA]. \gamma = \lim_{n\to \infty}\frac{1}{n}\mathbb{E}[\ln \Vert A \Vert].

Kesten's theorem in any dimension. Assume that γ<0\gamma<0 and that there is an s>0s>0 such that h(s)=1,E[As(lnA)+]<,E[Bs]<.\begin{aligned}&h(s)=1, && \mathbb{E}[\Vert A \Vert^s (\ln \Vert A \Vert)_+]<\infty, && \mathbb{E}[|B|^s]<\infty.\end{aligned} Then, solutions of (4) exist, and XX is heavy-tailed in any direction; in other words, for any nonzero uRnu\in \mathbb{R}^n, there is a constant c(u)>0c(u)>0 such that

P(u,X>x)c(u)xs. \mathbb{P}(\langle u, X\rangle > x) \sim c(u)x^{-s}.

The proof is much more technical without any new ideas.

Renewal theory in less than one page

Let μ\mu be a continuous probability distribution on R+\mathbb{R}_+. Our goal is to show how renewal theory can be used to study convolution equations, represent their solutions with a probabilistic model, and draw consequences on their asymptotic behaviour. We are interested in finding ff such that

f(x)=g(x)+f(xu)μ(du)f(x) = g(x) + \int f(x-u) \mu(du)

where gg is some function. Noting fμ(x)=f(xu)μ(du)f\star \mu(x) = \int f(x-u)\mu(du), we recognize a convolution between the function ff and the measure μ\mu, and the equation is simply fμf=gf - \mu \star f = g. Upon noting UU a random variable with distribution μ\mu, we can also represent this equation as

f(x)=g(x)+E[f(xU)]. f(x) = g(x) + \mathbb{E}[f(x-U)].

But if this equation is satisfied for some ff, then plugging it into f(xU)f(x-U) yields

f(x)=g(x)+E[g(xU)+E[g(xUU)]]=g(x)+E[g(xU)]+E[g(xUU)]+E[g(xUUU)]\begin{aligned}f(x) &= g(x) + \mathbb{E}[g(x-U) + \mathbb{E}[g(x - U - U')]] \\ &= g(x) + \mathbb{E}[g(x-U)] + \mathbb{E}[g(x - U - U')] + \mathbb{E}[g(x-U-U'-U'')] \end{aligned}

and so on, with U,U,U...U,U',U''... iid copies of UU. The following theorem follows by iterating this trick infinitely.

Representation theorem. Let UiU_i be iid random variables with distribution μ\mu, and let S0=0S_0=0 and Sn=U1++UnS_n = U_1 + \dotsb + U_n be the associated random walk. Then, if gg is either nonnegative the unique solution of (17) is given by f(x)=E[n=0g(xSn)].f(x) = \mathbb{E}\left[ \sum_{n=0}^\infty g(x-S_n)\right].

The name "renewal process" comes from the fact that the SnS_n are considered as occurence times of events. The UiU_i are thus waiting times between the events. Renewal theory is interested in processes like Nt=N_t= the number of events before time tt.

Proof of the representation theorem. First, note that (20) is well-defined since gg is nonnegative. It can possibly be \infty. Now, for unicity, we consider another solution ff' of (17); the difference δ=ff\delta = f-f' is a solution of δ=δμ\delta = \delta \star \mu. But then the Laplace transforms would satisfy δ^=δ^μ^=δ^μ^2=...=δ^(μ^)n\hat\delta = \hat\delta \hat\mu = \hat\delta \hat\mu^2 = ... = \hat\delta (\hat\mu)^n and so on. This is not possible: since μ^(x)=0exuμ(du)<1\hat \mu(x) = \int_0^\infty e^{-xu}\mu(du)<1 (as long as x>0x>0 and μ\mu is diffuse) we would have δ^(x)=0\hat\delta (x) = 0 for any x>0x>0.

It is quite rare that the representation theorem yields the explicit solution ff. One would need to compute the expectation of the series, which is most cases can be tedious. But this representation gives access to the whole toolbox of limit theorems in probability, and in particular, laws of large numbers, which translate into exact asymptotics for ff. That's the key renewal theorem.

Renewal theorem I (Blackwell's version). For any a>0a>0, limxE[Nt+a]E[Nt]=aE[U].\lim_{x\to \infty}\mathbb{E}[N_{t+a}] - \mathbb{E}[N_t] = \frac{a}{\mathbb{E}[U]}. Renewal theorem II (Ultimate version). If gg is « directly Riemann integrable », then the solution of (17) satisfies limxf(x)=0g(u)duE[U]\lim_{x\to\infty}f(x) = \frac{\int_0^\infty g(u)du}{\mathbb{E}[U]} where UU has distribution μ\mu.

References

[1]  indeed, there's a catch. It is not possible to directly prove that gg is directly Riemann integrable. Instead, what Goldie did is that he mollified the problem by convoluting gg with a mollifier ρδ\rho_\delta, proved the equivalent for this version, then sent δ\delta to zero.