Bayesian True Power Ratings for the NFL

In a recent post, I laid out the framework for developing a Bayesian power ratings model for the NFL using the BUGS/JAGS simulation software. That was a really simple model that essentially amounted to little more than a standard linear regression (or ridge regression). At the end of the article I suggested that one area of improvement would be to take into account turnovers. So, this is my first attempt to do that (at least, the first one that I'm writing about).

The basic idea with turnovers is that to a very large extent (but not entirely!) they occur randomly, and to the extent that a team's turnover differential is due to chance, its "apparent" power rating (based purely on total *point* differential) is deceiving (see on the subject and his technique for dealing with it).

What you'd like to have is a "true power rating" that reflects the team's actual strength including its "real" (i.e. persistent over time) turnover differential, but discounting the effect of any random turnover component on the team's apparent power rating. So with that as motivation, I adjusted the model by adding an additional term that attempts to account for turnover differential. I'll lay out the model below, comment on each section, and then discuss the ratings.

model {
  for (i in 1:num_games) {
        mov[i] ~ dnorm(mov.hat[i], tau1)
        mov.hat[i] <- HFA + inprod( true_power[] , 
                      matchup[i,] ) - tov_points*tod[i]
    tod[i] ~ dnorm(tod.hat[i], tau3)
    tod.hat[i] <- c0 + inprod( adj_tov_rate[] , 
                  matchup[i,])
        }

Here I'm setting up the margin of victory (mov) for each game as a sum of home field advantage (HFA), true power rating (true_power), and the points generated off turnover differential, which amounts to estimating an "adjusted turnover rate" (adj_tov_rate) for each team and a league-wide estimate for the expected points scored due to turnovers (tov_points).

for (i in 1:32) {
    true_power[i] ~ dnorm(0, tau2)
    adj_tov_rate[i] ~ dnorm(0, tau4)
    apparent_rating[i] <- true_power[i] - 
                          adj_tov_rate[i]*tov_points
}

In this section of the code, I'm setting up the priors for true_power and adj_tov_rate (assuming both are normally distributed with a mean of zero) and setting up a monitor variable to track the apparent_rating for each team for comparison. You can see that the calculation of apparent rating now follows directly by subtracting the (rating) points due to turnovers from the true power rating.

  tov_points ~ dnorm(4,0.01)
  HFA <- 3
  tau1 ~ dexp(0.1)
  tau2 ~ dexp(0.1)
  tau3 ~ dexp(0.1)
  c0 ~ dnorm(0, tau5)
  tau4 ~ dexp(0.1)
  tau5 ~ dexp(0.1)

Finally, in addition to setting the HFA to a fixed value (alternatively, we could estimate it from the data as well), priors for each of the precision parameters are defined.

Now let's look at some data, starting with last year's ratings sorted by true power:

2011 Power Ratings

TP = True Power; AR = Apparent Rating; ∆11 = AR11 - TP11 

RK TEAM TP11 AR11 ∆11
1 New.Orleans 8.95 8.66 -0.29
2 Pittsburgh 6.67 5.08 -1.59
3 Philadelphia 6.03 4.30 -1.73
4 Baltimore 4.51 4.70 0.19
5 New.England 4.33 6.22 1.89
6 Green.Bay 3.30 6.46 3.16
7 Houston 2.73 3.46 0.73
8 San.Diego 2.25 1.38 -0.87
9 Miami 2.22 1.37 -0.85
10 New.York.Jets 1.61 1.12 -0.49
11 Detroit 1.44 3.08 1.64
12 Cincinnati 0.75 0.69 -0.06
13 Atlanta 0.73 1.78 1.05
14 Arizona 0.72 -0.85 -1.57
15 San.Francisco 0.63 4.15 3.52
16 Dallas 0.35 0.81 0.46
17 Chicago 0.16 0.55 0.39
18 Tennessee -0.24 -0.26 -0.02
19 Washington -0.43 -2.21 -1.78
20 New.York.Giants -0.72 0.20 0.92
21 Carolina -1.21 -1.10 0.11
22 Seattle -1.36 -0.27 1.09
23 Denver -1.56 -2.96 -1.40
24 Buffalo -2.17 -2.31 -0.14
25 Oakland -2.73 -3.26 -0.53
26 Cleveland -3.97 -3.91 0.06
27 Minnesota -4.12 -4.43 -0.31
28 Jacksonville -4.66 -4.17 0.49
29 Kansas.City -5.23 -5.59 -0.36
30 Tampa.Bay -5.60 -7.40 -1.80
31 Indianapolis -5.81 -7.34 -1.53
32 St..Louis -7.61 -8.10 -0.49

The last column (∆11) represents the component of the apparent rating (AR11) due to turnovers last season. You can see, for example, that a huge component of the 49ers rating (usually interpreted as their on-field success) was due to turnover differential. This was cited by Football-Outsiders (and others) as one of the main reasons that the Niners should have been expected to regress significantly this year toward their "true power" rating. It seems that less concern was generated by Green Bay's ∆ component, which accounts for fully half of their apparent rating last season (hmm...).

Ok, so then what about this season, now that we're through 5 games.

2012 Power Ratings

RK TEAM TP12 AR12 ∆12
1 San.Francisco 1.05 2.61 1.56
2 Denver 0.82 -0.62 -1.44
3 Houston 0.63 3.11 2.48
4 Chicago 0.62 3.75 3.13
5 Minnesota 0.53 0.62 0.09
6 Seattle 0.43 -0.01 -0.44
7 Miami 0.36 -1.00 -1.36
8 Philadelphia 0.31 -2.03 -2.34
9 New.York.Giants 0.30 1.26 0.96
10 Dallas 0.28 -2.13 -2.41
11 Kansas.City 0.28 -5.08 -5.36
12 Cincinnati 0.20 -1.14 -1.34
13 Baltimore 0.19 1.86 1.67
14 Atlanta 0.18 3.55 3.37
15 Green.Bay 0.17 0.02 -0.15
16 San.Diego 0.12 0.53 0.41
17 New.England 0.08 3.72 3.64
18 Detroit 0.05 -1.07 -1.12
19 Arizona -0.02 1.56 1.58
20 Indianapolis -0.09 -1.53 -1.44
21 Pittsburgh -0.12 0.21 0.33
22 St..Louis -0.15 1.19 1.34
23 Carolina -0.22 -1.66 -1.44
24 New.York.Jets -0.30 -1.36 -1.06
25 New.Orleans -0.40 -0.36 0.04
26 Cleveland -0.43 -1.42 -0.99
27 Tampa.Bay -0.47 0.83 1.30
28 Buffalo -0.58 -2.30 -1.72
29 Washington -0.71 2.39 3.10
30 Tennessee -0.98 -2.75 -1.77
31 Oakland -1.02 -1.23 -0.21
32 Jacksonville -1.02 -1.53 -0.51

Well, ain't that something? San Francisco ranks at the top of true power this season while still maintaining a hefty ∆ in turnovers. This is sort of a double whammy of goodness. Not only has SF appeared to have improved, but their turnover differential has not regressed as much as people thought it should have (or could have or may have or what have you, you know?).

Conversely, look at Green Bay. Not only has their power rating regressed to middle of the pack, but they are the team that has apparently experienced a huge regression to the mean in ∆. A double whammy in reverse!

Ok, now for the finale. I'm going to take the true power ratings for 2012 and add to them the average of the ∆ for 2011 and 2012, with the idea being that the "true" ∆ is somewhere in between. (There are better ways to do this than simple averaging, and eventually, I'll get the model to do this for me, but this shouldn't be too far off the mark.) This is what I am calling for now the "True Power Rating" for 2012.

2012 True Power Ratings

RK TEAM TP12 ∆avg True Power
1 San.Francisco 1.05 2.54 3.59
2 New.England 0.08 2.77 2.85
3 Atlanta 0.18 2.21 2.39
4 Chicago 0.62 1.76 2.38
5 Houston 0.63 1.61 2.24
6 Green.Bay 0.17 1.51 1.68
7 New.York.Giants 0.30 0.94 1.24
8 Baltimore 0.19 0.93 1.12
9 Seattle 0.43 0.33 0.76
10 Minnesota 0.53 -0.11 0.42
11 Detroit 0.05 0.26 0.31
12 St..Louis -0.15 0.43 0.28
13 Arizona -0.02 0.01 -0.01
14 Washington -0.71 0.66 -0.05
15 San.Diego 0.12 -0.23 -0.11
16 Cincinnati 0.20 -0.70 -0.50
17 New.Orleans -0.40 -0.13 -0.53
18 Denver 0.82 -1.42 -0.60
19 Dallas 0.28 -0.98 -0.70
20 Tampa.Bay -0.47 -0.25 -0.72
21 Miami 0.36 -1.11 -0.75
22 Pittsburgh -0.12 -0.63 -0.75
23 Carolina -0.22 -0.67 -0.89
24 Cleveland -0.43 -0.47 -0.90
25 Jacksonville -1.02 -0.01 -1.03
26 New.York.Jets -0.30 -0.78 -1.08
27 Oakland -1.02 -0.37 -1.39
28 Buffalo -0.58 -0.93 -1.51
29 Indianapolis -0.09 -1.49 -1.58
30 Philadelphia 0.31 -2.04 -1.73
31 Tennessee -0.98 -0.90 -1.88
32 Kansas.City 0.28 -2.86 -2.58

Always more to do, but I kind of like this.

Post comment as twitter logo facebook logo
Sort: Newest | Oldest
7 pts

The turnover effects are overwhelming the AR12 -- The delta has a much bigger range than the TP estimates. Yet, in the 2011 data the turnover effects seem more reasonable. Maybe the model should be more aggressive about regressing the turnovers?

9 pts moderator

 creedofhubris I don't know. I read somewhere that this has been one of most unpredictable seasons in recent memory. I've seen models that are producing crazy results like Denver being #1. 

7 pts

 thecity2 What happens if you run the data from the first few weeks of last year? 

9 pts moderator

 creedofhubris Good suggestion. I haven't done that yet, but I should. I would assume the TP11 ratings will be similarly shrunk, but it would be interesting to see the ∆ values.

7 pts

Those TP12 ratings are wayyyy too conservative. In real life, a team with an average margin of victory of <-10 at this point in the season averages a margin of ~-5.5 in the next game, and >10 gives you a margin of ~5.

9 pts moderator

 creedofhubris They are and those won't come up until more games are played. It's the bias-variance tradeoff.