Small variations in figures get politicians and commentators excited, but we may be wrong to read patterns into them

What do all these numbers mean? “‘Worrying’ jobless rise needs urgent action – Labour” was the BBC headline. It explained the problem in its own words: “The number of people out of work rose by 38,000 to 2.49 million in the three months to June, official figures show.”

Now there are dozens of different ways to quantify the jobs market – I’m not going to summarise them all here. The claimant count and the labour force survey are commonly used, and number of hours worked is informative, too: you can fight among yourselves for which is best, and get distracted by party politics to your heart’s content. But in claiming this figure for the number of people out of work has risen, the BBC is just wrong.

Here’s why. The “labour market” figures come through the Office for National Statistics, and it has published the latest numbers in a PDF document. On page 13, top table, 4th row, you will find the figures the BBC is citing. Unemployment aged 16 and above is at 2,494,000, and has risen by 38,000 in a quarter (32,000 in a year). But you will also see some other figures, after the symbol “±”, in a column marked “sampling variability of change”.

Those figures are called “95% confidence intervals”, and are one of the most useful inventions of modern life.

We can’t do a full census of the whole population every time we want some data, because they’re too expensive and time-consuming. Instead, we take what we hope is a representative sample.

This can fail in two interesting ways. Firstly, a sample can be *systematically* unrepresentative: if you want to know about the health of the population as a whole, but you survey people in a GP’s waiting room, then you’re an idiot.

But a sample can also be unrepresentative by chance, via sampling error. This is not caused by idiocy. Imagine a large bubblegum vending machine containing thousands of blue and yellow bubblegum balls. You know that exactly 40% of those balls are yellow. When you take a sample of 100 balls, you might get 40 yellow ones, but in fact, as you intuitively know already, sometimes you get 32, sometimes 48, or 37, or 43, or whatever. This is sampling error.

Now, normally, you’re at the other end of the telescope. You take your sample of 100 balls, but you don’t know the true proportion of yellow balls in the jar – you’re trying to estimate that – so you calculate a 95% confidence interval around whatever proportion of yellow you get in your sample of 100 balls, using a formula (in this case, 1.96 x √ ((0.6×0.4) ÷ 100)).

What does this mean? Strictly (it still makes my head hurt), this means that if you repeatedly took samples of 100, then on 95% of those attempts, the true proportion in the jar would lie somewhere between the upper and lower limits of the 95% confidence intervals of your samples. That’s all we can say.

So, if we look at these employment figures, you can see that the changes reported are clearly not statistically significant: the estimated change over the past quarter is 38,000, but the 95% confidence interval is ± 87,000, running from -49,000 to 125,000. That wide range clearly includes zero, no change at all. The annual change is 32,000, but again, that’s ± 111,000.

I don’t know what’s happening to the economy; it’s probably not great. But these specific numbers tell us nothing, and there is an equally ^{}important problem arising from that, which is frankly more enduring for meaningful political engagement. We are barraged, every day, with a vast quantity of numerical data, presented with absolute certainty and fetishistic precision. In reality, many of these numbers amount to nothing more than statistical noise, the gentle static fuzz of random variation and sampling error, making figures drift up and down, following no pattern at all, like the changing roll of a dice. This, I confidently predict, will never change.