Can we really come up with a statistical model for projecting Olympic medals?

I’ve often joked that I want to be the Nate Silver of Olympic medal projections. But Nate knows a lot more about stats than I do — I never took a single class in the subject, and I just hack my way through spreadsheets on the basis of some self-teaching and the occasional journalists’ seminar. (Did I just ruin my chances of getting hired to consult at the new FiveThirtyEight?)

Since Sochi, I’ve embarked on a bit more self-teaching in spreadsheets and stats. At the same time, I firmly believe I’m hitting the limits of what stats can tell us about Olympic performance.

Check this prototype I’ve made for the Rio 2016 medal projections:

[gview file=”https://duresport.files.wordpress.com/2014/03/2016-track-and-field-sheet1.pdf”%5D

A bit of explanation:

– Columns C through I are results from the Olympics, World Championships and the Diamond League. Obviously, we have a long way to go in this cycle, so many spaces are x’d out.

– Column J is each athlete’s personal best. Column K is each athlete’s best from the past season — for now, from 2013.

– I’ve assigned points to each of these columns, as you’ll see in the lower half of the spreadsheet. These can be adjusted without redoing a whole lot of work! If I decide to count the 2013-14 Diamond Leagues a little bit less, I simply change the point values in that chart. And I can duplicate this spreadsheet for use in sports that have different competitions.

So the point system is already bringing some subjectivity into the mix. I’ve decided to weigh the 2012 Olympics, the 2015 World Championship and the 2016 Diamond League more heavily than other competition. Then I’ve made a judgment call to assign points to times.

Then add another bit of subjectivity: Column N is an adjustment value. I can use this to account for any competitions missed through injury (Yohan Blake, Asafa Powell) or suspension (Tyson Gay).

Add it all up, and you have three columns that look scientific. Column O is “PI” or Predictive Index. (Yes, I said “Performance Index” on the spreadsheet – please ignore that.) Column P shows the percentage of possible points — divide an athlete’s PI by the “Max predictive index” in the middle of the spreadsheet. That number will rise throughout the cycle. When the 2014 Diamond League is complete, we’ll add 15 (the maximum value for Diamond League standings) for a total of 105.

Then Column Q is “odds”, a simple percentage chance of earning a medal in this event. I tinkered with a couple of possible formulas for this column. Perhaps I simply apply the %max column and adjust it so the numbers will add up to reality — you wouldn’t want four people to have am 80% chance at winning a medal, for example. Or perhaps I calculate how many standard deviations an athlete’s PI is away from the other contenders.

What formula is in there now? None. I eyeballed it.

That’s not a final decision. Perhaps I’ll figure out a statistically sound way to convert the PI into actual odds. But I’m not sure it’s really necessary.

We know Usain Bolt will win a medal unless he (A) isn’t healthy or (B) has a serious problem at the start, including a false start. The 100 meters, moreso than most events, is all about raw speed. Work up to 1,500 meters, and tactics become an issue — in a slower race, finishing speed is more important than a personal best over the whole distance.

I haven’t taken age into account, though I would expect the 2015 and 2016 results to catch anyone on the decline. But for now, I’m skeptical that Justin Gatlin will be in 2012 form in 2016.

So to make the 2016 projections, I’ll compile a lot of numbers. That helps, of course. If Nickel Ashmeade doesn’t improve his personal best of 9.90, it’s ridiculous to declare him a medal favorite. Yet when all is said and done, I’m going to leave some space for a gut feeling.

This isn’t a 162-game baseball season, where weather conditions and other factors tend to even out over time. This isn’t a presidential election, where substantial polling points to clear trends, and Nate’s success has shut up the pundits who didn’t get the math. This is a projection of who is going to run the fastest in one 10-second race.

I do hope to add some probability this time around. Usain Bolt (if healthy) will be much more likely to win a medal for Jamaica than my gold medal pick in some random judo weight class in which 10 people have a legitimate shot to win, and I hope my medal count projection will reflect this.

So I’m not afraid of a little math. I’m just looking for a healthy balance between the calculated world and the real world.