πŸ‹πŸΌ Heavy tails III: Kesten's theorem

November 2023

Motivation:Β ARCH models

In 2003, Robert F. Engle won the Nobel prize in economy for his innovative methods in time-series analysis;Β to put it sharply, Engle introduced ARCH models into economics. In his seminal 1982 paper (30k+ citations), he wanted to model the time-series (yt)(y_t) representing the inflation in the UK. Most previous models used a simple autoregressive model of the form yt+1=Ξ±yt+Ξ΅ty_{t+1} = \alpha y_t + \varepsilon_t, where Ξ΅t\varepsilon_t is an external Gaussian noise with variance Οƒ2\sigma^2 and Ξ±<1\alpha<1 a "forgetting" parameter. The problem with these models is that the variance of the inflation at time t+1t+1 knowing the inflation at time tt, that is Var(yt+1∣yt)\mathrm{Var}(y_{t+1}|y_t), is simply the variance of Var(Ξ΅t)=Οƒ2\mathrm{Var}(\varepsilon_t)=\sigma^2, which does not depend on yty_t.

Engle wanted a model where the conditional variance would depend on yty_t:Β there are good reasons to think that volatility is sticky. The model he came up with (equations (1)-(3) in his paper) is simply

yt+1=Ξ΅tΓ—Ξ±+Ξ²yt2.\begin{aligned} &y_{t+1} = \varepsilon_t \times \sqrt{\alpha + \beta y_t^2} . \end{aligned}

In other words, the variance of yt+1y_{t+1} given yty_t is Ξ±+Ξ²yt2\alpha + \beta y_t^2. This is the simplest of Auto-Regressive Conditionally Heteroscedastic (ARCH) models. Upon squaring everything in (1), the equation becomes

yt+12=Ξ±Ξ΅t2+Ξ²Ξ΅t2yt2.y_{t+1}^2 = \alpha \varepsilon^2_t + \beta \varepsilon_t^2 y_t^2.

This is a linear recursion in yt2y_t^2. In the paper, Engle introduced a few variations and crafted statistical methods to estimate the parameters and their significance.

A central question in this way of modelling things is: how does yny_n behave in the long term? Does yny_n stay stable, can it take extremely large values (crises, shocks and crashes), and if so, at which frequency?

If yt2y_t^2 converges in distribution towards a random variable YY, then (2) shows that YY and Ξ±Ξ΅2+Ξ²Ξ΅2Y\alpha \varepsilon^2 + \beta \varepsilon^2 Y must have the same probability distribution, where Ξ΅\varepsilon is an N(0,Οƒ2)\mathscr{N}(0,\sigma^2), independent of YY. This is an instance of a very general kind of equations, called affine distributional equations: they are equations of the form

Y=lawAY+BY \stackrel{\mathrm{law}}{=} AY+B

where A,BA,B are random variables independent of YY. It turns out that these equations are generally impossible to solve. However, a theorem of Harry Kesten states, perhaps not so intuitively, that the law of any solution must have a heavy tail: in contrast with, say, Gaussian distributions, for which taking extremely large values (Β« shocks Β») has an exponentially small probability, heavy-tailed distributions can take large values with polynomially small probability, which is… not so rare at all!

Here is a small simulation over 10610^6 periods of time and parameters Ξ±=0.1,Ξ²=1,Οƒ=0.1\alpha=0.1, \beta=1, \sigma = 0.1. The histogram indicates that y(t)y(t) seems to have heavy tails, that is, the probability of observing a unusually large value is polynomially small (and not exponentially small).

This is one of Kesten's most famous results, published in 1973 in Acta Mathematica.

Kesten's theorems

From now on, we'll study the solutions of the equation

X=lawAX+B X \stackrel{\mathrm{law}}{=} AX+B

where XX is a random variable over Rd\mathbb{R}^d, AA is a random matrix and BB a random vector, both independent of XX. The goal is to present Kesten's theorem, which states that on certain conditions on AA and BB, the solution XX must have heavy tails;Β that is, P(X>x)\mathbb{P}(X>x) decreases polynomially fast, and not exponentially fast as for Gaussian variables. This result is somehow mind-blowing in its precision, since we have access to the exact tail exponent.

A full, rigorous proof would be packed with technicalities;Β I'll only show a simili-proof in dimension 11, which, although not being complete, was sufficiently clear and idea-driven to convince me that Kesten's result holds. I'm also including a simili-proof of the Renewal theorem, which might be of independent interest.

One-dimensional case

For simplicity, we restrict to the case where A,BA,B are independant and have a density with respect to the Lebesgue measure. In general, it is not obvious that solutions to (4) exist, but a former result by Kesten states that it is the case if E[ln⁑∣A∣]<∞\mathbb{E}[\ln |A|]<\infty.

Kesten's theorem (1973).

(i) Assume that A>0A>0 almost surely and that there is an s>0s>0 such that E[∣A∣s]=1,E[∣A∣s(ln⁑∣A∣)+]<∞,E[∣B∣s]<1.\begin{aligned}&\mathbb{E}[|A|^s]=1, &&\mathbb{E}[|A|^s (\ln |A|)_+]<\infty, &&\mathbb{E}[|B|^s]<1.\end{aligned} Then, there are two constants cΒ±c_\pm such that when xβ†’βˆžx\to \infty, P(X>x)∼c+xβˆ’s,P(X<βˆ’x)∼cβˆ’xβˆ’s.\begin{aligned}&\mathbb{P}(X>x)\sim c_+ x^{-s},&&&\mathbb{P}(X<-x) \sim c_- x^{-s}.\end{aligned} (ii) The same result holds if AA can take positive and negative values, and in this case c+=cβˆ’c_+ = c_-.

Sketch of proof in the 1d case

I'll only sketch the proof ideas in the subcase of (i) where in addition, BB is nonnegative. In this case, we can safely assume that XX is nonnegative by conditionning over the set {X>0}\{X>0\}. We set f(x)=esxP(X>ex)f(x) = e^{sx}\mathbb{P}(X > e^x);Β our final goal is to prove that f(x)f(x) converges towards some constant when xβ†’+∞x\to+\infty, the βˆ’βˆž-\infty case being identical.

The recursion (4) shows that f(x)=esxP(AX+B>ex)f(x) = e^{sx}\mathbb{P}(AX + B > e^x). However, if xx is very large, we could guess that P(AX+B>ex)\mathbb{P}(AX + B > e^x) is close to P(AX>ex)\mathbb{P}(AX > e^x). This is the origin of the first trick of the proof, which is to artificially write

f(x)=esx(P(AX+B>ex)βˆ’P(AX>ex))+esxP(AX>ex).\begin{aligned} f(x) &= e^{sx}(\mathbb{P}(AX+B>e^x) - \mathbb{P}(AX>e^x)) + e^{sx}\mathbb{P}(AX>e^x). \end{aligned}

Let us note gg the first term on the right:

g(x)=esx(P(AX+B>ex)βˆ’P(AX>ex)).g(x)=e^{sx}(\mathbb{P}(AX+B>e^x) - \mathbb{P}(AX>e^x)).

For the second term, we can express it in terms of ff by using E[As]=1\mathbb{E}[A^s]=1:

esxP(X>exβˆ’ln⁑A)=E[Ases(xβˆ’ln⁑(A))1X>exβˆ’ln⁑A].\begin{aligned} e^{sx}\mathbb{P}(X>e^{x - \ln A}) &= \mathbb{E}[A^s e^{s(x-\ln(A))}\mathbf{1}_{X>e^{x - \ln A}}]. \end{aligned}

Now, here comes the second trick: the change of measure. We assumed that E[As]=1\mathbb{E}[A^s]=1, hence we can define a probability distribution by Es[Ο†]=E[AsΟ†]\mathbb{E}_s[\varphi] = \mathbb{E}[A^s \varphi], and the expression above becomes Es[f(xβˆ’ln⁑A)]\mathbb{E}_s[f(x- \ln A)]. Overall, we obtain that ff is a solution of the following equation:Β 

βˆ€xβ©Ύ0,f(x)=g(x)+Es[f(xβˆ’ln⁑A)].\forall x \geqslant 0, \qquad f(x) = g(x) + \mathbb{E}_s[f(x - \ln A)].

If ΞΌs\mu_s is the density of log⁑A\log A under the measure Ps\mathbb{P}_s, this equation becomes f(x)=g(x)+f⋆μ(x)f(x) = g(x) + f\star \mu(x) where ⋆\star denotes the convolution operator. Such equations are called convolution equations and they can be studied using classical probability theory. The main result of Renewal Theory, exposed later below, shows that

then there is only one solution to this equation, and more crucially it satisfies

lim⁑xβ†’βˆžf(x)=∫0∞g(u)duEs[ln⁑A]=∫0∞g(u)duE[Asln⁑A]=:c.\lim_{x\to \infty}f(x) = \frac{\int_0^\infty g(u)du}{\mathbb{E}_s[\ln A]} = \frac{\int_0^\infty g(u)du}{\mathbb{E}[A^s \ln A]}=:c.

If E[Asln⁑A]>0\mathbb{E}[A^s \ln A]>0, as we assumed, then c>0c>0, because gg is a nonzero positive function so its integral is nonzero. Consequently,

lim⁑xβ†’βˆžesxP(X>ex)=c. \lim_{x\to \infty}e^{sx}\mathbb{P}(X>e^x) = c.

This is equivalent to P(X>x)∼cxβˆ’s\mathbb{P}(X>x) \sim cx^{-s}, as requested. There is, however, a serious catch: to apply the main Renewal theorem we need to check the two conditions listed above. The second one is nothing but an assumption. However, the first one needs gg to be "directly Riemann integrable", which is indeed very difficult to check. I won't do this part at all[1].

Multi-dimensional case

In the multi-dimensional case, a similar theorem holds. One can find in the litterature a host of different settings regarding the random matrix AA and the random vector BB; for simplicity I'll stick to the case where AA is positive definite almost surely, BB has all entries nonnegative almost surely, and A,BA,B have densities with respect to the Lebesgue measure. The multi-dimensional analogue of E[∣A∣s]\mathbb{E}[|A|^s] will now be

h(a)=lim⁑nβ†’βˆžE[ln⁑βˆ₯A1Γ—...Γ—Anβˆ₯a]1n\begin{aligned}h(a) = \lim_{n\to\infty}\mathbb{E}[\ln \Vert A_1 \times ... \times A_n \Vert^a]^\frac{1}{n}\end{aligned}

where βˆ₯Mβˆ₯\Vert M \Vert is the operator norm. We also introduce the Lyapounov exponent, which is the equivalent of E[ln⁑∣A∣]\mathbb{E}[\ln |A|]:

Ξ³=lim⁑nβ†’βˆž1nE[ln⁑βˆ₯Aβˆ₯]. \gamma = \lim_{n\to \infty}\frac{1}{n}\mathbb{E}[\ln \Vert A \Vert].

Kesten's theorem in any dimension. Assume that Ξ³<0\gamma<0 and that there is an s>0s>0 such that h(s)=1,E[βˆ₯Aβˆ₯s(ln⁑βˆ₯Aβˆ₯)+]<∞,E[∣B∣s]<∞.\begin{aligned}&h(s)=1, && \mathbb{E}[\Vert A \Vert^s (\ln \Vert A \Vert)_+]<\infty, && \mathbb{E}[|B|^s]<\infty.\end{aligned} Then, solutions of (4) exist, and XX is heavy-tailed in any direction; in other words, for any nonzero u∈Rnu\in \mathbb{R}^n, there is a constant c(u)>0c(u)>0 such that

P(⟨u,X⟩>x)∼c(u)xβˆ’s. \mathbb{P}(\langle u, X\rangle > x) \sim c(u)x^{-s}.

The proof is much more technical without any new ideas.

Renewal theory in less than one page

Let ΞΌ\mu be a continuous probability distribution on R+\mathbb{R}_+. Our goal is to show how renewal theory can be used to study convolution equations, represent their solutions with a probabilistic model, and draw consequences on their asymptotic behaviour. We are interested in finding ff such that

f(x)=g(x)+∫f(xβˆ’u)ΞΌ(du)f(x) = g(x) + \int f(x-u) \mu(du)

where gg is some function. Noting f⋆μ(x)=∫f(xβˆ’u)ΞΌ(du)f\star \mu(x) = \int f(x-u)\mu(du), we recognize a convolution between the function ff and the measure ΞΌ\mu, and the equation is simply fβˆ’ΞΌβ‹†f=gf - \mu \star f = g. Upon noting UU a random variable with distribution ΞΌ\mu, we can also represent this equation as

f(x)=g(x)+E[f(xβˆ’U)]. f(x) = g(x) + \mathbb{E}[f(x-U)].

But if this equation is satisfied for some ff, then plugging it into f(xβˆ’U)f(x-U) yields

f(x)=g(x)+E[g(xβˆ’U)+E[g(xβˆ’Uβˆ’Uβ€²)]]=g(x)+E[g(xβˆ’U)]+E[g(xβˆ’Uβˆ’Uβ€²)]+E[g(xβˆ’Uβˆ’Uβ€²βˆ’Uβ€²β€²)]\begin{aligned}f(x) &= g(x) + \mathbb{E}[g(x-U) + \mathbb{E}[g(x - U - U')]] \\ &= g(x) + \mathbb{E}[g(x-U)] + \mathbb{E}[g(x - U - U')] + \mathbb{E}[g(x-U-U'-U'')] \end{aligned}

and so on, with U,Uβ€²,Uβ€²β€²...U,U',U''... iid copies of UU. The following theorem follows by iterating this trick infinitely.

Representation theorem. Let UiU_i be iid random variables with distribution ΞΌ\mu, and let S0=0S_0=0 and Sn=U1+β‹―+UnS_n = U_1 + \dotsb + U_n be the associated random walk. Then, if gg is either nonnegative the unique solution of (17) is given by f(x)=E[βˆ‘n=0∞g(xβˆ’Sn)].f(x) = \mathbb{E}\left[ \sum_{n=0}^\infty g(x-S_n)\right].

The name "renewal process" comes from the fact that the SnS_n are considered as occurence times of events. The UiU_i are thus waiting times between the events. Renewal theory is interested in processes like Nt=N_t= the number of events before time tt.

Proof of the representation theorem. First, note that (20) is well-defined since gg is nonnegative. It can possibly be ∞\infty. Now, for unicity, we consider another solution fβ€²f' of (17); the difference Ξ΄=fβˆ’fβ€²\delta = f-f' is a solution of Ξ΄=δ⋆μ\delta = \delta \star \mu. But then the Laplace transforms would satisfy Ξ΄^=Ξ΄^ΞΌ^=Ξ΄^ΞΌ^2=...=Ξ΄^(ΞΌ^)n\hat\delta = \hat\delta \hat\mu = \hat\delta \hat\mu^2 = ... = \hat\delta (\hat\mu)^n and so on. This is not possible: since ΞΌ^(x)=∫0∞eβˆ’xuΞΌ(du)<1\hat \mu(x) = \int_0^\infty e^{-xu}\mu(du)<1 (as long as x>0x>0 and ΞΌ\mu is diffuse) we would have Ξ΄^(x)=0\hat\delta (x) = 0 for any x>0x>0.

It is quite rare that the representation theorem yields the explicit solution ff. One would need to compute the expectation of the series, which is most cases can be tedious. But this representation gives access to the whole toolbox of limit theorems in probability, and in particular, laws of large numbers, which translate into exact asymptotics for ff. That's the key renewal theorem.

Renewal theorem IΒ (Blackwell's version). For any a>0a>0, lim⁑xβ†’βˆžE[Nt+a]βˆ’E[Nt]=aE[U].\lim_{x\to \infty}\mathbb{E}[N_{t+a}] - \mathbb{E}[N_t] = \frac{a}{\mathbb{E}[U]}. Renewal theorem II (Ultimate version). If gg is Β« directly Riemann integrable Β», then the solution of (17) satisfies lim⁑xβ†’βˆžf(x)=∫0∞g(u)duE[U]\lim_{x\to\infty}f(x) = \frac{\int_0^\infty g(u)du}{\mathbb{E}[U]} where UU has distribution ΞΌ\mu.

References

[1]  indeed, there's a catch. It is not possible to directly prove that gg is directly Riemann integrable. Instead, what Goldie did is that he mollified the problem by convoluting gg with a mollifier ρδ\rho_\delta, proved the equivalent for this version, then sent δ\delta to zero.