Double-yolk eggs, clustering and the financial crisis

I happened to be listening when Radio 4’s “Today Show” had a little debate about the probability of getting a pack of six double-yolk eggs.  Tim Harford, who they called to help them sort it out, relates the story here.

So there are two thinking styles here. One is to solve the probability problem as posed. The other is to apply some common sense to figure out whether the probability problem makes any sense. We need both. Common sense can be misleading, but so can precise-sounding misspecifications of real world problems.

There are lessons here for the credit crunch. When the quants calculate that Goldman Sachs had seen 25 standard deviation events, several days in a row, we must conclude not that Goldman Sachs was unlucky, but that the models weren’t accurate depictions of reality.

One listener later solved the two-yolk problem. Apparently workers in egg-packing plants sort out twin-yolk eggs for themselves. If there are too many, they pack the leftovers into cartons. In other words, twin-yolk eggs cluster together. No wonder so many Today listeners have experienced bountiful cartons.

Mortgage backed securities experienced clustered losses in much the same unexpected way. If only more bankers had pondered the fable of the eggs.

The link Tim gives in the middle of my quote is to this piece, also by Tim, at the FT.  Here’s the bit that Tim is referring to (emphasis at the end is mine):

What really screws up a forecast is a “structural break”, which means that some underlying parameter has changed in a way that wasn’t anticipated in the forecaster’s model.

These breaks happen with alarming frequency, but the real problem is that conventional forecasting approaches do not recognise them even after they have happened. [Snip some examples]

In all these cases, the forecasts were wrong because they had an inbuilt view of the “equilibrium” … In each case, the equilibrium changed to something new, and in each case, the forecasters wrongly predicted a return to business as usual, again and again. The lesson is that a forecasting technique that cannot deal with structural breaks is a forecasting technique that can misfire almost indefinitely.

Hendry’s ultimate goal is to forecast structural breaks. That is almost impossible: it requires a parallel model (or models) of external forces – anything from a technological breakthrough to a legislative change to a war.

Some of these structural breaks will never be predictable, although Hendry believes forecasters can and should do more to try to anticipate them.

But even if structural breaks cannot be predicted, that is no excuse for nihilism. Hendry’s methodology has already produced something worth having: the ability to spot structural breaks as they are happening. Even if Hendry cannot predict when the world will change, his computer-automated techniques can quickly spot the change after the fact.

That might sound pointless.

In fact, given that traditional economic forecasts miss structural breaks all the time, it is both difficult to achieve and useful.

Talking to Hendry, I was reminded of one of the most famous laments to be heard when the credit crisis broke in the summer. “We were seeing things that were 25-standard deviation moves, several days in a row,” said Goldman Sachs’ chief financial officer. One day should have been enough to realise that the world had changed.

That’s pretty hard-core.  Imagine if under your maintained hypothesis, what just happened was a 25-standard deviation event.  That’s a “holy fuck” moment.  David Viniar, the GS CFO, then suggests that they occurred for several days in a row.  A variety of people (for example, Brad DeLong, Felix Salmon and Chris Dillow) have pointed out that a 25-standard deviation event is so staggeringly unlikely that the universe isn’t old enough for us to seriously believe that one has ever occurred.  It is therefore absurd to propose that even a single such event occurred.   The idea that several of them happened in the space of a few days is beyond imagining.

Which is why Tim Harford pointed out that even after the first day where, according to their models, it appeared as though a 25-standard deviation event had just occurred, it should have been obvious to anyone with the slightest understanding of probability and statistics that they were staring at a structural break.

In particular, as we now know, asset returns have thicker tails than previously thought and, possibly more importantly, the correlation of asset returns varies with the magnitude of that return.  For exceptionally bad outcomes, asset returns are significantly correlated.

The likelihood-ratio threshold is the shadow price of statistical power

Cosma Shalizi, an associate professor in statistics at Carnegie Mellon University, gives an interpretation of the likelihood-ratio threshold in an LR test: It’s the shadow price of statistical power:

[…]

Suppose we know the probability density of the noise p and that of the signal is q. The Neyman-Pearson lemma, as many though not all schoolchildren know, says that then, among all tests off a given size s, the one with the smallest miss probability, or highest power, has the form “say ‘signal’ if q(x)/p(x) > t(s), otherwise say ‘noise’,” and that the threshold t varies inversely with s. The quantity q(x)/p(x) is the likelihood ratio; the Neyman-Pearson lemma says that to maximize power, we should say “signal” if its sufficiently more likely than noise.

The likelihood ratio indicates how different the two distributions — the two hypotheses — are at x, the data-point we observed. It makes sense that the outcome of the hypothesis test should depend on this sort of discrepancy between the hypotheses. But why the ratio, rather than, say, the difference q(x) – p(x), or a signed squared difference, etc.? Can we make this intuitive?

Start with the fact that we have an optimization problem under a constraint. Call the region where we proclaim “signal” R. We want to maximize its probability when we are seeing a signal, Q(R), while constraining the false-alarm probability, P(R) = s. Lagrange tells us that the way to do this is to minimize Q(R) – t[P(R) – s] over R and t jointly. So far the usual story; the next turn is usually “as you remember from the calculus of variations…”

Rather than actually doing math, let’s think like economists. Picking the set R gives us a certain benefit, in the form of the power Q(R), and a cost, tP(R). (The ts term is the same for all R.) Economists, of course, tell us to equate marginal costs and benefits. What is the marginal benefit of expanding R to include a small neighborhood around the point x? Just, by the definition of “probability density”, q(x). The marginal cost is likewise tp(x). We should include x in R if q(x) > tp(x), or q(x)/p(x) > t. The boundary of R is where marginal benefit equals marginal cost, and that is why we need the likelihood ratio and not the likelihood difference, or anything else. (Except for a monotone transformation of the ratio, e.g. the log ratio.) The likelihood ratio threshold t is, in fact, the shadow price of statistical power.

It seems sensible to me.

Going global

Warning: own-trumpet blowing below.

I had a look at my blog’s statistics this morning and discovered that I’m globally popular! Here are the origins of my last 100 page-loads:

going_global.jpg

(click on the image for a better view)

Yes, I’m sure that any blog worth it’s salt gets visitors from all over the place, but it’s still pretty cool.

A request for help: wordpress stats

In case any of my viewers knows anything about wordpress … I just posted this support request over at wordpress.org:

I am using v1.1.1 of the WordPress.com stats plugin. Since I installed it (on the 30th of January), the statistics I see on wordpress.com have been odd, to say the least.

I realise that what I see on wordpress.com stats does not include my own page views, so those stats ought to be lower than the total views.

Here are the stats thus-far for February via wordpress.com: http://john.barrdear.com/stuff/wordpress_stats.jpg

Here are the stats for the same period from my host: http://john.barrdear.com/stuff/site_stats.jpg

For example, wordpress.com thinks that my “Beaten to the punch” post has seen 13 hits, but my host reports 3 views (1 entry, 1 exit).

What appears (to me) to be happening is that wordpress.com is recording hits to several posts against just one post.

As another example, my site gets aggregated here: http://ozpolitics.info/feeds. When someone clicks on the link on that page, they come through to the post-specific page on my site. WordPress.com stats are recognising ozpolitics.info/feeds as the referrer, but not the post-specific page as a hit.

Here is a specific example: http://john.barrdear.com/stuff/stat_inconsistency.jpg

Notice that yesterday I got two referrals from ozpolitics.info/feeds. I only had two articles listed on the ozpolitics feed yesterday: “Idle Curiosity” and “Sweating the small stuff”, neither of which is listed as getting a hit in yesterday’s posts.

If anyone out there has any clue what might be happening, please let me know, either here or on the wordpress.org site.  Thanks.