Analytics.
Perhaps the single most divisive word in the entire sporting lexicon. Hailed as revolutionary, a game-changing adoption, or even just a mere adjunct to the game, fancy maths have been embraced by just about all sports, by all franchises, in some measure — so much so, that MIT’s Sloan School consistently cites sports as being the vanguard of the data revolution (even above and beyond Wall Street).
The people in this camp believe that sports, as with most things in the universe, are capable of being operationalized, defined, empirically observed, and analyzed to arrive at a quantifiable series of variables that tells us far more than “the odds” or “gut feelings” or “analysis” or “conventional wisdom” ever will — the proof is in the pudding, and that pudding is made with math, not milk. In this faction, you will find me and Brent among its most strident defenders.
While you’re only as good as your record, analytics can — with some decent degree of accuracy — tell you what’s likely to happen before the tip off, kick off, or first pitch, but not what will happen. And the more events you can define, measure, and control for, the more accurate that “prediction”. (though, they are not true predictions, as we’ll discuss infra).
Did you hate Tommy Rees’s playcalling? Perhaps because it defied what we mathematically know about successful football — and we warned you beforehand exactly how it would play out.
Did you love Alabama’s run to the Final Four last season? Perhaps because Nate Oats is at the bleeding edge of basketball analytics — and we told you exactly what the results of the three-and-rim offense produce, as well as its downsides (turnovers, etc.)
Is the math sexy? No. I don’t have groupies throwing panties at my server farm; there are no hotel keys put on top of my laptop. But it’s very effective. So effective that, when placed in the right hands, the nerds literally built Las Vegas. It built Tua Tagovailoa’s offense under Mike McDaniel and allowed Andy Reid to design a scheme for Patrick Mahomes. It even paid my way through law school.
Then you have the old school approach, people who couldn’t tell a slide rule from a sled. Detractors of “the numbers” rightly point to decades of success that created the conventional wisdom: Paul Bryant didn’t care about math. Rick Pitino didn’t build a career with his teams jacking up 35 perimeter shots a game. Tony LaRussa wasn’t wedded to a WAR sheet. Nick Saban was famously disinterested in analytics. Brian Snitker won a World Series on the back of his notorious disdain for analytics (and has blown far more chances at a title because of it, I’d add).
The critics also point to cases where analytics are simply foolish, where their use have ruined or diminished sports: the mid-range jumper is a dying art, training for performance based solely on metrics is destroying mental and physical health of athletes, swinging for Ks, BBs, and HRs, is boring as hell unless they’re actually making contact, the shift has killed the base hit, a decade of the neutral zone trap almost destroyed hockey, Kevin Cash famously trashed the ‘Rays World Series run “going by the data,” Dan Lanning blew three wins against DeBoer’s Washington teams blindly following analytics on fourth down (allegedly). It doesn’t look like the sport, ad infinitum.
No matter which faction you ascribe to, however, most fans still don’t have a firm grasp of what “the numbers” can and can’t tell us in the context of wagering, but these lessons apply to sports more broadly. So, we’re going to attempt to demystify them for you.
We can’t prove causation
It is axiomatic that all true experiments require a control group: “Do the same thing to X that you do to Y, but change one variable.” You learned this in 7th grade science class. But, we simply cannot do that in sports. Like macroeconomics there are simply far too many moving parts, too many imperfect comparisons, too many variables that prove incapable of measuring or are elusive or inconsistent. In macroeconomics, as in sports, we guess that all things being equal, this is how a particular outcome should play out. Ceteris paribus.
But things aren’t equal. You can’t make Jake Coker morph into Tua Tagovailoa. You can’t make Tyler Booker’s footwork be as nimble as Lester Cotton. You can’t affect whether Jaeden Roberts’ back is going to spasm mid-play, or if he will stay healthy for 65 snaps like Barrett Jones. Thus, you’re never “holding all things equal.”
That’s why we don’t make predictions at GAM: We instead provide you with a likely outcome, in narrative form, based upon what the data are suggesting will happen.
That’s all any analytics can tell you, at the end of the day.
GIGO:
Modeling is only as good as the inputs that we use. Computer science folks have a term for it: GIGO — garbage in, garbage out. If your data set or code are cruddy, incomplete, inaccurate, or your analytical statistical methods are garbage, then whatever you put out is going to be absolutely useless. (Seen a Bethesda game lately?)
Want to know an example of what it looks like in the real world? S+P (and thus the FPI’s) reliance on “success rate” to predict point spreads. You may as well read tea leaves.
Want to know another one? Time of possession.
Another one? Hell, it’s one this site lived off of for several years: “Pythagorean wins.”
All of those tools can tell us many things, but they are descriptive indices, at the end of the day, not prescriptive ones.
You have probably heard old timers phrase these intuitive concepts far more memorably: “You can’t make chicken salad out of chicken shit,” and “a tool for every task.”
Figuring out what is the garbage, what is the right tool, is the whole point of analytics. Without “the stats” you’ve just got blind faith. And if you don’t use the right ones, or you don’t produce the right models, then you may as well be hammering a nail in with a crescent wrench: It may sorta’ work, but it’s not right and you’ve done a half-ass job.
We don’t know what we don’t know, but we can measure some of it
Here at RBR we tend to take a dim view of Pro Football Focus’s grading scheme. But credit must be given where credit is due: They’re trying to fill a void in performance statistics.
See, among the many “GIGO” variables is that we simply don’t know whether an analytically correct playcall is going to be subject to execution errors. How would you know, unless you have the playsheet in front of you, know the keys, know the hot routes, etc? But PFF has tried to remove the uncertainty from this by basing player grades (and ultimately team grades) on assignment accuracy. Can they always tell that RB1 was supposed to hit the B-gap, and that the guard was supposed to block down? No, of course not. But they combine a hybrid scouting / film review approach and try to do just that.
Is it useful in terms of a point spread analysis? No, it’s not whatsoever. So too are other mostly meaningless analyses like “Power rating” and “grades” (looking at you, Phil Steele). And the why of it is simple: We don’t know what every assignment is, whether a given outcome was the product of unseen execution or communication error. We don’t always know if a player has a lingering injury, or if Star Linebacker’s dog died that week. They all affect the outcome, but they can’t be operationalized, measured, defined, subjected to analysis.
Thus, like the Three-Body Problem, we’re always going to be awash in uncertain chaos with no workable equation that gives us the precise answer. So we can never make a full prediction — it is not prospective, in other words.
But it’s not the three-body problem. We don’t want perfection; we want a general idea: We are dealing with whole numbers here (-7, over 52, etc). And to that end we can still make generalizations based on the outcomes. So it is more more like computer modeling of Theia hitting the Earth to create the moon: there are many, many ways each molten tendril of planetary matter could have been ejected and receded. But we don’t care: we do know the final result — there is a moon in the sky. And like that moon, we can work backwards with sports analytics and get a good general picture of what happened (even if we can never analyze data at that granular a level) — so uncertainty is defined retrospectively, in other words. We can’t measure it, except by its absence.
And we call what is not known, that absence, the “unexplained variance.” You might call it “the human factor.”
Vegas has the best statisticians in the world; they’ve been able to operationalize a terrifying number of variables and predict about 84% of all the “stats” that generally go into an expected outcome, and then produce a line close to that number, that will entice equal money on either side of the wager — but that’s still 16% that they don’t know, that never can know.
After 24 years of this, I’ve been able to refine mine to a composite of three models that use between 137 and 151 variables, and “the Machine” can explain about 77.6% of all variance in a college football game (college basketball is even easier: You just need seventeen main stats to predict a point spread with almost 70% accuracy). But there’s still 22% that I’ll just never be able to find the time or creativity or prowess to overcome.
That’s why we have a margin of error, on one hand; and why we can say — with some mathematical certainty — that “momentum” exists, on the other. Because we can define it, measure it, and post hoc show you when it occurred, why, and how.
And we can do that for almost any nebulous truism.
Finally, we don’t make predictions; we describe relationships.
I hope by this point that I’ve explained what we do here, why we do it, and the limits of any analytics. While those are important points to be sure, they are in many respects mere prologue to the single most important thing that you need to take away from this: any prognostication of point spreads or outcomes just describes a complex series of relationships based upon past events; they can never be a definitive statement of the future. But they can be expressed in probabilistic ones.
“The odds” say going for it in plus territory is always the smart decision. Why? Because we have 60-years of post-Merger data that have shown the outcome versus not doing so. Therefore, we describe correlates of past conditions — and whether it happened four decades ago or four play ago, all of those conditions, those “data points” matter. It’s also why “gamblers” fallacy is precisely that: no one is ever “due.”
But, this is sports website, so we aim to entertain you here, not merely elucidate. So, when RBR spits numbers at you, it’s not to win you over with arguments by authority; it’s to help better tell a story.
I hope we succeed more often than not.
Well, that’s it.
That’s what analytics do, in the context of wagering specifically. The lessons are portable to other realms of sports (and life generally), though. Indeed, we hope you will do so. Becoming better informed consumers of data will drives clearer beliefs and better analysis, even better wagering — rather than so much of what we do in life: shoot an arrow and then paint a bullseye around it.
And those data can be, and are, beautiful in themselves. They describe a material reality…we just try to put that reality to paper.
Without getting too mystical or esoteric, a man far smarter than I once said, “mathematics is the language with which God hath written the Universe.” And, like Galileo, if God is going to bother speaking, may as well listen, huh?
—
The author is a recovering trial lawyer, data scientist, and black metal musician. And he’s in remission on all of those terrible traits.
Poll
Did you dig this?
-
0%
Yes, it was very helpful. Made me really rethink stats.
(0 votes)
0 votes total
Vote Now
Want some more of these? I crank the data for (almost) every single game, every single week over at my companion site: (Almost) Giving Away Money. Check it out, and prosper.
Not even adjusted for inflation! Still just five bucks a month.
(Except for the headline, this story has not been edited by PostX News and is published from a syndicated feed.)