πŸ‹πŸΌ Heavy tails IV: cascades of events

June 2026

Sixty stocks jump in the same minute β€” coincidence or not? In many fields where we study the occurrence of simultaneous events in time, it turns out that the number of simultaneous events is often small, but can also be dramatically big, with a typically heavy-tail distribution. There is a nice explanation for this phenomenon, coming from the theory of random avalanches –- a fancy name for Galton-Watson processes. This post explains the connection.

Price jumps in financial markets

ook at the prices of stocks (say, in the S&P500) and examine how many of them experience a jump at the same time (say, within the same hour or minute). This is called co-jump analysis. Here are two plots taken from an insightful paper:β€―

On the top, you see the number of minutes, in every year, where there was a co-jump. For example the green line counts the minutes where more than 60 stocks jumped at the same time. The most interesting plot is the second one, plotting the distribution of the sizes of the co-jumps. The scale is logarithmic, and clearly this distribution looks heavy tailed, perhaps with an index of Ξ±=0.5\alpha=0.5, which would mean that the number N(k)N(k) of co-jumps of kk stocks is roughly of order O(kβˆ’1.5)O(k^{-1.5}). As we’ll see, other papers estimate the tail to be rather O(kβˆ’2)O(k^{-2}).

Why should these types of events exhibit a heavy-tail behaviour? There is a very simple answer: there is a "cascading" happening behind the jumps. One jump occurs somewhere: say, the price of AAPL\textsf{AAPL} goes up first. Then this jump triggers a few jumps in related companies, like GOOGL,MSFT\textsf{GOOGL}, \textsf{MSFT}. The jump for GOOGL\textsf{GOOGL} triggers in turn another jump at AMZN\textsf{AMZN}. Maybe the jump at Microsoft didnt trigger anything, but in turn the jump at AMZN\textsf{AMZN} triggers a jump in NFLX\mathsf{NFLX}. And so on, the initial event cascades into the network, until the contagion stops.

This kind of cascading effect (Β« random avalanches Β») naturally leads to a heavy-tailed distribution for the total number of events that happened in the end. This is what we are going to prove.

Galton-Watson processes and the total population law

The total number of events in a cascade described as earlier is well captured by Galton-Watson processes. In a GW process, one starts with a random initial number of events, say Z1∈NZ_1\in\mathbb{N}. Then each of these events triggers a random number of extra events: if Yi(1)Y_i^{(1)} is the number of events triggered by element ii, then the total number of events that were triggered by the initial generation is

Z2=Y1(1)+β‹―+YZ1(1).Z_2 = Y^{(1)}_1 + \dotsb + Y^{(1)}_{Z_1}.

This is the size of the second generation, and the process goes on:β€―if at generation nn there are ZnZ_n events, then at the next generation one has

Zn+1=Y1(n)+β‹―+YZn(n)Z_{n+1} = Y^{(n)}_1 + \dotsb + Y^{(n)}_{Z_n}

where the Yi(n)Y^{(n)}_i are all iid with the same distribution pk=P(Yi(n)=k)p_k = \mathbb{P}(Y^{(n)}_i = k), called progeny distribution. In the end, the total number of events which happened is

X=Z1+Z2+β‹―X = Z_1 + Z_2 + \dotsb

where it is perfectly possible that this is infinite! For example, if p0=0p_0 = 0 then every events triggers at least one other event, so it can very well happen that X=∞X=\infty.

There’s an infinite litterature on GW processes, but we’ll be interested in the distribution of XX, namely P(X=k)\mathbb{P}(X = k) for every k∈Nβˆͺ{∞}k\in \mathbb{N}\cup \{\infty\}. We’re going to prove that this distribution is heavy-tailed, indeed.

Dwass’s representation of the total population law

P(X=k)=1kP(Y1+β‹―+Yk=kβˆ’1).\mathbb{P}(X=k) = \frac{1}{k}\mathbb{P}(Y_1 + \dotsb + Y_k = k-1).

Proof. We assume Z1=1Z_1=1 for simplicity (one initial event); the general case is the same up to a convolution with the law of Z1Z_1.

Take a cascade with total size X=kX=k. List the kk events in the order they were discovered in the cascade: first the initial event, then the events it triggered, then the events triggered by those, and so on. If yiy_i is the number of events triggered by the ii-th event in this list, we get a sequence (y1,…,yk)(y_1,\dotsc,y_k) of nonnegative integers. Clearly βˆ‘i=1kyi=kβˆ’1\sum_{i=1}^k y_i = k-1: there are kk events and kβˆ’1k-1 triggering relations between them. Conversely, any sequence with this sum encodes a cascade if and only if it never "runs out of events" before time kk. Writing Sj=y1+β‹―+yjS_j = y_1 + \dotsb + y_j, this means

Sjβ©ΎjforΒ j=1,…,kβˆ’1.S_j \geqslant j \qquad \text{for } j=1,\dotsc,k-1.

For instance, if y1=0y_1=0 then the initial event triggers nothing and the cascade stops at size 11, so kk cannot be larger than 11.

Thus P(X=k)\mathbb{P}(X=k) is the probability that (Y1,…,Yk)(Y_1,\dotsc,Y_k) satisfies these conditions. Now comes the trick. For fixed y1,…,yky_1,\dotsc,y_k with βˆ‘yi=kβˆ’1\sum y_i = k-1, consider all kk cyclic rotations of the sequence.

Cyclic lemma. If a1,…,ana_1,\dotsc,a_n are integers with βˆ‘i=1nai=nβˆ’1\sum_{i=1}^n a_i = n-1, then exactly one cyclic rotation (ai,ai+1,…,ai+nβˆ’1)(a_i,a_{i+1},\dotsc,a_{i+n-1}) (indices taken mod nn) satisfies ai+β‹―+ai+jβˆ’1β©Ύja_i + \dotsb + a_{i+j-1} \geqslant j for every j=1,…,nβˆ’1j=1,\dotsc,n-1.

The lemma is purely combinatorial; a one-line proof goes by comparing, for each rotation, the first time its partial sum drops below the diagonal. Since the total sum is nβˆ’1n-1, exactly one rotation stays above.

Back to the cascade. If Y1+β‹―+Ykβ‰ kβˆ’1Y_1+\dotsb+Y_k \neq k-1, no rotation of (Y1,…,Yk)(Y_1,\dotsc,Y_k) can satisfy the condition above, so Xβ‰ kX\neq k. If Y1+β‹―+Yk=kβˆ’1Y_1+\dotsb+Y_k = k-1, the cyclic lemma says that exactly one rotation does satisfy it. Because the YiY_i are iid, all kk rotations have the same probability, hence

P(X=k)=1k P(Y1+β‹―+Yk=kβˆ’1).\mathbb{P}(X=k) = \frac{1}{k}\,\mathbb{P}(Y_1 + \dotsb + Y_k = k-1).

The phase transition

It is a classical topic in probability courses to study the transition happening in GW trees. I’m not going to prove it, but only to state it, because we will see the transition appear in the sequel. Let ΞΌ=E[Y]\mu = \mathbb{E}[Y] be the mean of the progeny distribution. It is intuitive that if ΞΌ<1\mu<1, then since every event triggers, on average, less than one extra event, then the cascade should stop at some point;β€―and if ΞΌ>1\mu>1 then there is a serious chance that the cascade never stops and goes on forever. This is indeed a theorem.

  • Extinction. If ΞΌ<1\mu<1, then P(X<∞)=1\mathbb{P}(X<\infty)=1 and E[X]<∞\mathbb{E}[X]<\infty.

  • Critical case. If ΞΌ=1\mu=1, then P(X<∞)=1\mathbb{P}(X<\infty)=1 but E[X]=∞\mathbb{E}[X] = \infty.

  • Survival. if ΞΌ>1\mu>1, then P(X<∞)<1\mathbb{P}(X<\infty)<1.

In the survival case, the Kesten-Stigum theorem additionnally states that

  • Exponential growth. If E[Y(ln⁑Y)+]<∞\mathbb{E}[Y (\ln Y)_+]<\infty, then either the population dies, or it survives, and then there is a (random) constant C>0C>0 such that Zn∼CΞΌnZ_n \sim C \mu^n.

  • Subexponential growth. Otherwise, if E[Y(ln⁑Y)+]<∞\mathbb{E}[Y (\ln Y)_+]<\infty, then Zn=o(ΞΌn)Z_n = o(\mu^n) almost surely.

We will not use the Kesten-Stigum result, I only stated there because I like it. But we’ll see that the critical transition at ΞΌ=1\mu=1 has its own importance when studying the total population size.

The CLTβ€―approximation

So now, we can estimate how heavy is the tail of XX. In fact, Y1+…+Yn=:SnY_1 + \dotsc + Y_n =:S_n is just a random walk, a sum of iid random variables, and we know how to deal with them. If ΞΌ=E[Y]\mu = \mathbb{E}[Y] and Οƒ2=Var(Y)\sigma^2 = \mathrm{Var}(Y), the CLT says that

Snβˆ’nΞΌβ‰ˆN(0,Οƒ2n).S_n - n\mu \approx N(0,\sigma^2 n ).

Let us approximate P(Sn=nβˆ’1)\mathbb{P}(S_n = n-1). Clearly, we have

P(Sn=nβˆ’1)=P(Snβˆ’nΞΌ=n(1βˆ’ΞΌ)βˆ’1)β‰ˆP(Snβˆ’nΞΌ=Ο•n)\begin{aligned}\mathbb{P}(S_n = n-1)&= \mathbb{P}\left(S_n - n\mu = n(1 - \mu) - 1\right)\\ &\approx \mathbb{P}\left(S_n - n\mu = \phi n\right) \end{aligned}

where we set Ο•=1βˆ’ΞΌ\phi = 1-\mu. How could we approximate this probability? Well, there are two cases. If ΞΌ>1\mu>1, then there is a nonzero probability of the cascade going on and on up to infinity. In many real-world situations, this would be called apocalyptic; it falls in a totally different typology of rare events. We would rather be interested in the case where ΞΌβ©½1\mu\leqslant 1. Indeed, in this regime, we see that

P(Sn=nβˆ’1)β‰ˆP(Snβˆ’nΞΌβ‰ˆnΟ•),\mathbb{P}(S_n = n-1) \approx \mathbb{P}(S_n - n\mu \approx n\phi),

but Sn∼N(0,Οƒ2n)S_n \sim N(0,\sigma^2 n). It is well known that a Gaussian with standard deviation dd will fall with high probability in an interval [βˆ’3d,3d][-3d, 3d]. So if nΟ•n\phi has order n\sqrt{n}, we could approximate P(Snβˆ’nΞΌβ‰ˆnΟ•)\mathbb{P}(S_n - n\mu \approx n\phi) with the CLT; this happens when ΞΌ\mu is very close to being critical, ie ΞΌβ‰ˆ1\mu\approx 1. Otherwise, if ΞΌ<1\mu<1, then P(Snβˆ’nΞΌβ‰ˆnΟ•)\mathbb{P}(S_n - n\mu \approx n\phi) falls within the regime of rare events and large deviations.

The critical case and the CLT

If Ο•β‰ˆ0\phi\approx 0, then we are examining the probability of N(0,Οƒ2n)N(0,\sigma^2 n) to be close to 0, which is well in the bulk of the distribution. This is roughly the Gaussian density,

P(Snβˆ’nΞΌβ‰ˆnΟ•)β‰ˆ12πσ2neβˆ’n2Ο•22Οƒ2n≍eβˆ’nΟ•22Οƒ2n.\mathbb{P}(S_n - n\mu \approx n\phi) \approx \frac{1}{\sqrt{2\pi\sigma^2 n}}e^{-\frac{n^2\phi^2}{2\sigma^2 n}} \asymp \frac{e^{-\frac{n\phi^2}{2\sigma^2}}}{\sqrt{n}}.

This approximation is actually rigorous:β€―one would need to use a variant of the CLTβ€―called the local-limit theorem to make it work.

The non-critical case and large deviations

If Ο•>0\phi > 0, then we are examining the probability of a Gaussian with std of order n\sqrt{n}, taking values of order nn. This lies in the large deviation regime, where we study the occurrence of very unlikely events. CramΓ©r’s theorem says that

P(Sn=nβˆ’1)β‰ˆP(Sn/nβ‰ˆ1)β‰ˆexp⁑(βˆ’nI(1))\mathbb{P}(S_n = n-1) \approx \mathbb{P}(S_n / n \approx 1) \approx \exp\left( - n I(1)\right)

where II is the rate function.

Conclusion

Going back at (4), we obtain an approximation for P(X=n)\mathbb{P}(X=n) in the near-critical regime ΞΌβ‰ˆ1\mu\approx 1. Noting c=1/2Οƒ2c = 1/2\sigma^2, it can be framed as follows.

P(X=n)≍1n3/2eβˆ’cn(1βˆ’ΞΌ)2.\mathbb{P}(X = n) \asymp \frac{1}{n^{3/2}}e^{- c n (1-\mu)^2}.

This is not exactly heavy-tailed in the classical sense, but it is already with smaller tails than the Gaussian: it is actually a Gamma distribution. When the mean progeny ΞΌ\mu attains the critical value 11, then Ο•=0\phi=0 and one obtains a typically heavy-tailed behaviour, P(X=n)≍nβˆ’1.5\mathbb{P}(X=n)\asymp n^{-1.5}, with tail index Ξ±=1/2\alpha = 1/2 as observed for the financial price co-jumps!

As a conclusion, we see that the total number of events that can happen in a cascade is heavy-tail near criticality. This is exactly the point: when the mean progeny is very small, nothing special happens in the sense that the total number of events is subexponentially distributed. But when the mean progeny reaches the critical point, the tails become fatter, until eventually they become Pareto with tail index Ξ±=0.5\alpha = 0.5.

You don’t need to be critical to have heavy tails

It can feel a little bit unsatisfying that (11) is not strictly speaking heavy-tailed for, say, ΞΌ=0.99\mu = 0.99. There is however a nice argument, which I found in Appendix E1 of this excellent paper by Rudy Morel, which justifies heavy tailedness. It consists in supposing that the mean progeny parameter is itself random, say uniformly distributed in an intervall containing 1, like [Ο„,1][\tau, 1]. In this case,

P(X=n)β‰ˆ11βˆ’Ο„βˆ«Ο„1nβˆ’3/2eβˆ’cnΟ•2dΟ•.\mathbb{P}(X=n) \approx \frac{1}{1-\tau}\int_\tau^1 n^{-3/2}e^{-c n \phi^2}d\phi.

We can perform this integral easily: by a change of variables, we see that

P(X=n)β‰ˆ11βˆ’Ο„1n3/21nβˆ«Ο„nneβˆ’cu2du.\mathbb{P}(X=n) \approx \frac{1}{1-\tau}\frac{1}{n^{3/2}}\frac{1}{\sqrt{n}}\int_{\tau\sqrt{n}}^{\sqrt{n}}e^{-c u^2}du.

If Ο„\tau is small, say Ο„nβ‰ˆ0\tau \sqrt{n}\approx 0, then the integral there is close to ∫0∞eβˆ’cu2du=πσ2/2\int_0^\infty e^{-cu^2}du = \sqrt{\pi\sigma^2 / 2}, a constant, and overall we get

P(X=n)≍1n2.\mathbb{P}(X=n) \asymp \frac{1}{n^2}.

This would give a tail index with index Ξ±=1\alpha=1. By choosing different distributions for Ο•\phi, one could easily get any heavy tail index Ξ±\alpha.

References