Heavy-tailed distributions: how they appear

November 2023

Heavy tails are ubiquitous in statistical modelling, however there are a few mechanisms giving birth to them. Here is a non-comprehensive list.

## Simple transformations

Suppose that XX is a random variable with a continuous density ff with f(0)=0f(0)=0. Then, 1/X1/X will be heavy tailed.

Random recursions

Kesten's theorem is an absolute gem in mathematics and probability. Roughly speaking, it says that if a series of random numbers is defined by the recursion Xt+1=AtXt+BtX_{t+1} = A_t X_t + B_t where At,BtA_t, B_t are random variables independent of XnX_n, then the limit of XtX_t has a heavy tail with index ss given by the equation E[As]=1\mathbb{E}[|A|^s]=1.

Maxima of random variables

The Fisher-Tippett-Gnedenko theorem says that if XiX_i are iid random variables and if there are numbers an,bna_n, b_n such that an1(max(X1,,Xn)bn)a_n^{-1}(\max(X_1, \dotsc, X_n) - b_n) converges in distribution, then either the limit is a Gumbel distribution, or it is heavy-tailed (Weibull or Fréchet).

Extremes of random walks are usually heavy-tailed, which is not so surprising given Kesten's theorem.

Non-exponential random growth: log-normality, Gibrat's law

Log-normal distributions can arise in financial models; under the hypothesis that returns follow a drifted Brownian motion, Ito's formula says that the price at time TT follows a log-normal distribution. More generally, if a time-varying quantity XtX_t has a growth rate which does not depend on XtX_t, then it is heavy-tailed: for example if Xt+1=(1+rt)XtX_{t+1} = (1+ r_t)X_t with the rtr_t being iid, then Xt=(1+rs)=eln(1+r1)++ln(1+rt)ersX_t = \prod (1 + r_s) = e^{\ln(1+r_1) + \dotsc + \ln(1+r_t)} \approx e^{\sum r_s} which by the CLT is approximately etξe^{t\xi} with ξN(Er,Var(r))\xi \sim \mathscr{N}(\mathbb{E}r, \mathrm{Var}(r)), which is log-normal.


Zipf's law is one of the most famous heavy-tailed distributions "from the real world".

Scale-free networks

Real-world graphs often have heavy-tailed degree sequences. A very beautiful explanation of this phenomenon lies in the famous scale-free property of preferential attachment mechanisms. In PA models, new elements (people, requests, molecules) arrive at each time step; when a new element arrives, it connects to (say) mm older elements, but it favors elements which alrealy have many connections. Remco van der Hofstadts's book has a whole chapter explaining why the degree of elements in such a model are asymptotically heavy-tailed.

Newman's paper

The 80-20 rule