Separating the Signal from the Statistical Noise

One of the consequences of the advance of sporting analytics is that perceived sporting wisdom is often overturned or at least becomes less certain and dogmatic.

“A 2-0 lead is the most dangerous in football” has no basis. The majority of teams holding such an advantage frequently hold out for the win.

They very rarely lose, although these rare occasions are inevitably well remembered, leading to a cognitive over estimation of their likelihood of re-occurring.

Similarly, “the table never lies” is not wholly accurate. The table often contains a side that is in an unaccustomed, elevated position, either through the quirks of an easy run of matches or a string of often narrow wins.

Newcastle’s 5th place finish in 2011/12 was achieved with just such a run and their goal difference of +5 would have more typically resulted in 9 fewer points than they actually achieved.

A season later they finished in 16th spot.

A typical Premier League contest would need to last for around four times its current 38 match season length before the best team was more likely than not to finish at the top of the table.

Much of the disconnection between ability and reward over relatively short timeframes is down to the influence of randomness.

Newcastle may have been fortunate in the way in which they scored and conceded goals in 2011/12.

They won many matches by a single goal margin and lost a few much more heavily. This pattern was likely to become less extreme in the future, leading to poorer results, as narrow wins turned into draws or even losses.

Expected goals models look at the process of chance creation and quantify each chance with a likelihood of success.

This makes it possible to compare a side’s expected goals totals, both created and allowed to their actual records to see if their underlying statistical indicators, based on a larger sample size of attempts, tallies with their actual record.

It is often possible to select narrative driven reasons for a team’s apparent over achievement.

Phrases such as “heart” and “will to win” often contribute to an appealing storyline, but more often than not we are merely witnessing a short term run of results that are fuelled by nothing more than randomness and when the random element returns to less extreme levels, the results also falter.

Huddersfield were such a case in the early weeks of the 2016/17 Championship table.

They had a new, young and charismatic manger and a new style of play. They were also early pacesetters in the table, winning eight of their first 11 games. All of their wins were by a single goal margin.

However, Timeform’s expected goals model ranked the side more modestly.

Ten of their first 15 points had been gained from matches where Huddersfield’s expected goals match totals were inferior to those of their opponents and in simulations of all matches played prior to their meeting with Brighton in September, Huddersfield were mostly likely to occupy a position just below mid table, rather than top.

Their 2-1 win at title favourites Newcastle illustrates how expected goals figures can shed a different light on to individual matches.

Newcastle outshot Huddersfield and also created better quality chances and the respective expected goals totals were 1.92 for Newcastle and just 0.33 for their visitors.

A team faced with such an expected goals deficit would typically win just 5% of such contests, yet Huddersfield on the day managed a 2-1 away win.

Huddersfield’s relatively poor underlying expected goals statistics suggested that their position at the top of the table flattered them as it was partly due to unsustainable luck.

Having gained 16 out of a possible 18 points during their hot start, they then won just 12 out of their next possible 27, culminating in a 5-0 defeat by Fulham.

By taking notice of underlying statistics that draw from a larger sample base, teams that are producing results that are inconsistent with their likely true abilities can be identified.

A return to less extreme results is likely, although care should be taken to not expect an immediate run of bad luck to even up the recently experienced good luck.

This is merely the gambler’s fallacy that expects short term runs of good or bad fortune is immediately compensated by a reversal of fortunes.

Statistical models around which such apps as Infogol are built, aim to identify what is and what is not sustainable in a team’s past record.

Recent blog entries