Starting work on Bayesian football power ratings

The more I learn about Bayesian statistics, the more I want to use the approach in my own sports research. I have found some other football rating systems that use Bayesian methodology (including and most recently ), so most of what I'm doing here is not novel. However, I feel it's important to document new things I'm working on as much as possible, because you can always learn something new from seeing how someone else tackles the same problem. And I usually learn something by forcing myself to write about it. Anyway, that's more or less the pedagogic motivation for this article. In this post, I'll just introduce the framework and starting point for the model and show some initial predictive results from last season. I promise to make this as high-level as possible.

Probability is about calculating the likelihood of a given set of data given some known parameters. The classic example is flipping a coin, which we "know" is equally likely to come up heads or tails. Statistics is essentially the inverse problem (or "inverse probability"), i.e. calculating the likelihood of a parameter or set of parameters lying within a certain range of values, given a known set of data. If you didn't know a coin was "fair", how many flips would it take for you to figure it out? The example I always give to people who know nothing about Bayesian statistics is imagine flipping a coin 10 times, and it comes up heads 8 times. Most people, hopefully, everyone reading this, will know intuitively that just because the coin doesn't come up heads exactly 5 times, it doesn't (necessarily) mean the coin is an unfair one. The purpose of Bayesian statistics, then, is to combine the evidence that a current set of data presents (i.e. 8/10 flips coming up heads) with our "prior knowledge" of the phenomena under study. Figuring out how to do that is where all the math stuff comes in handy, and up until the early 90's, it was actually a very difficult problem to solve in all but a few well-studied "toy" problems, because there was not enough computational power to tackle the really complex models.

Nowadays, any statistical hack like myself can download the necessary software tools for free, and run extremely sophisticated models on a pc or even a laptop (I'm doing this on a MacBook Air) that 20 years ago probably would have required a supercomputer (or two). Unquestionably, the computational tool that kickstarted the widespread Bayesian revolution in statistics over the past two decades (and the one that anyone can download for free), is , (Bayesian inference Using Gibbs Sampling). I'm using JAGS (Just Another Gibbs Sampler), which is essentially like BUGS, but runs easily within R on a Mac using the package .

The way these programs work is that you enter in the data along with your data model and prior knowledge. The data model and the prior knowledge is typically in the form of a probability distribution. Given all this information, the program crunches through thousands and thousands of simulations of the model with the given set of data (using so-called Monte Carlo methods), which in the end generates a probability distribution for the parameters of the model you are interested in. This is called the "posterior" distribution. Along with that, you can feed in other parameters to the model that you want predicted. Let me show you how it works with the NFL model I've been setting up.

model {
  for (i in 1:N) {
        mov[i] ~ dnorm(mov.hat[i], tau1)
        mov.hat[i] <- b0 + inprod( b[] , x[i,] )

This part describes the model for the margin of victory (mov) when two teams face each other. It says that the samples are from a standard normal distribution, the mean of which is given by the difference between the power ratings between the two teams plus home field advantage (b0), and the variance of which will be estimated by the model during the simulation. Think of this as the error that comes with any prediction.

  for (i in 1:32) {
    b[i] ~ dnorm(0, tau2)

Here, we're setting the prior distribution for the array of 32 coefficients that represents each team's power ratings. It's again a normal distribution, but we specify the mean to be 0 (i.e. we "know" this), and a second variance parameter, tau2, which will also be determined by the simulation.

b0 <- 3
tau1 ~ dexp(0.1)
tau2 ~ dexp(0.1)

Here we set the homefield advantage to a "known" constant (as opposed to estimating it from the data), and we set exponential priors on the two variance parameters. What you should note so far is that all we have explicitly said is "known" are that the mean of all team ratings should be zero and the HFA = 3 points, neither of which are controversial assumptions. Everything else is extremely general, and we just wait for the simulation to tell us what the actual parameter values turn out to be.

The data fed into the model consisted of the game results from the regular season. The ratings produced by the simulation were as follows:

Based on the regular season, that looks about right. Looking at a histogram of the distribution of team ratings, it looks normal as would be expected.

Distribution of team ratings.

Within the simulation, itself, one can make predictions about future events simply by monitoring certain variables of interest. In this case, I set up the simulation to monitor "virtual" matchups between teams that actually met in the post-season. In other words, I used the model based on regular season data to predict (out-of-sample) post-season results. For each game, we can look at the resulting probability distribution. Here's the prediction for SF vs. NYG as an example:

Bayesian prediction for Niners-Giants NFC Championship game.

You can see that according to the simulation, the Giants had almost no chance of winning the game. The probability lies almost entirely to the right of zero and is centered around 8.44 points. With hindsight, we can say that either the simulation was wrong incomplete, or the Giants got extremely lucky. A little luck was involved, to be sure, but I think it's safe to say the model is far from perfect at this point. Let's take a look at the predictions for all the post-season games:

So how did the model do overall? Well, to be perfectly honest, not all that great. It beat the spread 6/11 times, but lost big time to the Vegas spreads in terms of error (least squares columns). In non-Giants games, it beat Vegas spreads 6/8 times, but there always seems to be a "hot" team in January, and it doesn't help bettors to realize that only after the SB has been played. Still, this is just scratching the surface of what Bayesian models are capable of doing. As others have pointed out, we can try to build in the effects of turnovers and also try to account for trends within and across seasons. Much remains to be done. Let's go!

Post comment as twitter logo facebook logo
Sort: Newest | Oldest
7 pts

Getting close to Vegas accuracy with something this simple is a great result. 


However, Bayes didn't beat Vegas on BAL-HOU.

9 pts moderator

 creedofhubris The Vegas line on BAL-HOU was -7.5. If you picked Houston to beat the spread, you would have won. Bayes predicted the spread to be -4.3, meaning "Houston beats the spread". That is the sense in which "BEAT" should be understood here.

7 pts

 thecity2 Yeah, I realized that a few minutes ago. Makes perfect sense now!


I'm by no means an expert in NFL analysis but another simple/significant number you can work with is # of injured players on each team.

9 pts moderator

 creedofhubris Good idea!