Regressing Point Differential on The "Four Factors" (Part 2)

﻿﻿There are four factors of an offense or defense that define its efficiency: shooting percentage, turnover rate, offensive rebounding percentage, and getting to the foul line. Striving to control those factors leads to a more successful team. (Dean Oliver, "Basketball on Paper")

How well do these four factors predict point differential (and thus, winning)? How important are each of the factors relative to the others? The first question was the subject of Part 1 — now would be a convenient time to read Part 1, if you haven't already done so (don't worry, Part 2 will still be here when you get back, thanks to the magic of the interwebs). Today we will address the second question.

How important are each of the factors relative to the others? In Part 1, we found the following model for predicting point differential (p.d.) as a function of the four factors (well, eight factors, including offense+defense):

$p.d. = 10.41 + 1.49 * eFG(own) - 1.63 * eFG(opp) + 0.187 * FTR(own) - 0.213 * FTR(opp) -1.51 * TOR(own)+ 1.37 * TOR(opp) + 0.327 * ORR(own) -0.365 * ORR(opp)$

where,

• effective FG% (eFG): $eFG=(FG+0.5 *3PT)/FGA$
• foul rate (FTR): $FTR = FTA/FGA$
• turnover rate (TOR): $TOR=TOV / (FGA + 0.44 * FTA + TOV)$
• offensive rebounding rate (ORR): $ORR=ORB / (ORB + Opp DRB)$

Recall that positive coefficients (own eFG%, own FTR, opp TOR, own ORR) mean that terms add to point differential, while negative coefficients (opp eFG%, opp FTR, own TOR, opp ORR or own DRR) subtract from point differential.

Upon inspection of the model, one is, perhaps, initially tempted to conclude that the most important terms are the ones with the largest coefficients (in terms of absolute value) — eFG% and TOR. The problem with that logic is twofold: 1) It should be obvious that the means for each stat vary over a wide range (i.e. eFG% is typically around 50%, whereas TOR~13%). Therefore, even though the coefficients for eFG% and TOR are similar, eFG% is larger overall, and would appear to dominate. 2) The variation for each stat may vary. In other words, even if a parameter appears to be a large contributor based on its coefficient and mean, in practice, if there is little variation (i.e. between teams), it will not have a large effect on winning.

Fortunately, there is a straightforward way to deal with both issues and get at the truth. Specifically, we can use the model, itself, to calculate the variation in total wins due to a normalized change in each parameter. Here's how it works. First, I will temporarily take over David Stern's role as NBA Commish (thank you, thank you), and create a new franchise in Las Vegas (VEG — Vegas, baby!). Next, we will magically skip forward to next pre-season — yes, there was an expansion draft, no, VEG did not get LeBron James, although I will not rule out James having taken his talents to Vegas on several occasions. Before the season begins, we would like to predict how many wins VEG might have. How do we do this?

Oh, right, the model! Let's start out by assuming (optimistically) that our new franchise is average in all eight of the four factors (you know what I mean). How many wins would such a team produce (if you've already guessed around 41, eat a cookie or something)? Take a look at the table below. I've calculated the NBA average value and standard deviation (STD) for each category. Next, I varied each parameter by one standard deviation (in a direction that increases wins for that category), and used the model to predict point differential (P.D.) and wins (uh, Wins). Wins are related to point differential by the following formula (see here for explanation):

$W = 2.54 * p.d. + 40.9$

 eFG% FTR TOR ORR Wins Team P.D. Own Opp Diff Own Opp Diff Own Opp Diff Own Opp Diff NBA 0.2 49.6 49.6 0.0 31.4 31.4 0.0 13.8 13.8 -0.0 26.2 26.3 -0.1 STD 6.1 2.3 2.0 3.5 3.3 3.3 5.2 0.9 1.1 1.4 2.8 2.7 3.1 UNCH VEG0 0.1 49.6 49.6 0.0 31.4 31.4 0.0 13.8 13.8 0.0 26.2 26.3 -0.1 41.2 eFG(own) VEG1 3.5 51.9 49.6 2.3 31.4 31.4 0.0 13.8 13.8 0.0 26.2 26.3 -0.1 49.9 eFG(opp) VEG2 3.3 49.6 47.6 2.0 31.4 31.4 0.0 13.8 13.8 0.0 26.2 26.3 -0.1 49.5 FTR(own) VEG3 0.7 49.6 49.6 0.0 34.7 31.4 3.3 13.8 13.8 0.0 26.2 26.3 -0.1 42.8 FTR(opp) VEG4 0.8 49.6 49.6 0.0 31.4 28.1 3.3 13.8 13.8 0.0 26.2 26.3 -0.1 43.0 TOR(own) VEG5 1.4 49.6 49.6 0.0 31.4 31.4 0.0 12.9 13.8 -0.9 26.2 26.3 -0.1 44.7 TOR(opp) VEG6 1.6 49.6 49.6 0.0 31.4 31.4 0.0 13.8 14.9 -1.1 26.2 26.3 -0.1 45.0 ORR(own) VEG7 1.0 49.6 49.6 0.0 31.4 31.4 0.0 13.8 13.8 0.0 29.0 26.3 2.7 43.6 ORR(opp) VEG8 1.1 49.6 49.6 0.0 31.4 31.4 0.0 13.8 13.8 0.0 26.2 23.6 2.6 43.7

As expected, if VEG is totally average across-the-board (case VEG0), the model predicts 41.2 wins (no surprise, eh, that's just about 50%). And if you're complaining that the prediction is not exactly 41.0 wins, well, get a life. (And curl up with a good statistics book that can tell you about the nature of error and uncertainty in model predictions.)

Next, we change eFG%(own) by +1 STD from 49.6 to 51.9 (VEG1). The result is that VEG is now predicted to win 49.9 games. Wow! That's an increase of almost 8 wins, just by varying eFG% by 1 STD. What happens when we do the same thing to the other categories? Ok, alright. You get it by now...just look at the table.

Having varied each factor by +(-) 1 STD, we can now rank the factors in terms of wins produced over average. We see that the ranking goes:

 Rank Factor Case Prediction Wins Delta % 1 eFG(own) VEG1 3.5 49.9 8.7 26.8% 2 eFG(opp) VEG2 3.3 49.5 8.3 25.5% 3 TOR(opp) VEG6 1.6 45.0 3.8 11.8% 4 TOR(own) VEG5 1.4 44.7 3.5 10.6% 5 ORR(opp) VEG8 1.1 43.7 2.5 7.6% 6 ORR(own) VEG7 1.0 43.6 2.4 7.3% 7 FTR(opp) VEG4 0.8 43.0 1.8 5.5% 8 FTR(own) VEG3 0.7 42.8 1.6 4.9%

The last category (%) takes the wins produced above average (Delta) and divides that amount by the sum of the Deltas for each case. This is what we were looking for to begin with: the relative weight of each factor. Note that shooting efficiency (producing it and defending against it) accounts for about 52% of the extra wins. Shooting efficiency is followed by turnover ratio, rebounding, and foul rate.

To bring this back to reality a bit, now let's look at the current season and how teams at the top and bottom of the league are doing with respect to the four factors. The top and bottom rows represent hypothetical teams that are +1 or -1 STD relative to the mean in all 8 factors.

 eFG% FTR TOR ORR Rank Team P.D. Wins Own Opp Diff Own Opp Diff Own Opp Diff Own Opp Diff +1 12.7 73.2 51.8 47.6 4.2 34.7 28.1 6.6 12.9 14.9 -2.0 29.0 23.6 5.4 1 BOS 12.4 72.5 54.3 46.7 7.6 31.4 32.7 -1.4 14.5 15.5 -1.0 21.3 23.1 -1.8 2 MIA 11.5 70.1 51.7 46.1 5.6 38.0 31.3 6.7 12.8 13.5 -0.7 24.8 24.5 0.3 3 SAS 8.9 63.5 52.3 49.4 2.9 31.9 24.5 7.4 12.9 14.8 -1.8 26.4 25.7 0.7 4 LAL 7.4 59.6 50.7 47.5 3.2 29.8 25.2 4.7 12.6 13.3 -0.8 30.0 29.8 0.2 5 DAL 7.2 59.1 52.1 47.4 4.7 30.5 27.7 2.7 13.9 13.6 0.3 23.6 25.3 -1.7 26 SAC -6.7 23.9 46.8 51.0 -4.1 29.9 35.1 -5.2 13.8 13.5 0.4 29.9 26.5 3.4 27 NJN -7.0 23.1 46.8 49.2 -2.4 31.4 34.3 -2.9 13.7 11.6 2.1 24.6 25.0 -0.4 28 MIN -8.0 20.5 47.7 51.2 -3.5 28.5 36.0 -7.6 15.3 13.0 2.2 30.9 24.9 6.0 29 WAS -9.5 16.9 47.9 52.1 -4.2 29.4 33.1 -3.8 14.9 14.5 0.4 28.6 32.6 -4.0 30 CLE -10.0 15.5 46.5 53.0 -6.6 30.1 28.2 1.9 12.7 12.7 -0.0 22.0 23.7 -1.7 -1 -12.0 10.5 47.5 51.5 -4 28.2 34.6 -6.4 14.7 12.8 2.1 23.5 28.9 -5.4

I've highlighted in green (red) the values that are above (below) 1 STD from the mean (in a direction that produces more or less wins, respectively). Lastly, since this is a Warriors-centric blog, let's take a (sad and unfortunate) look at my favorite team with respect to the four factors:

 eFG% FTR TOR ORR Rank Team P.D. Wins Own Opp Diff Own Opp Diff Own Opp Diff Own Opp Diff GSW -4.7 29.0 49.8 51.4 -1.7 24.3 36.9 -12.6 14.4 15.1 -0.7 29.7 30.5 -0.8

Interestingly, the Warriors are not terrible in shooting, although they are just below league average in offensive efficiency, and well below in defensive efficiency. The Warriors are absolutely terrible in FTR. In fact, they are about 2 STD below the league average in going to the line. Surprisingly, considering the off-season acquisition of Lee and the return of Biedrins, the defensive rebounding is really bad. However, the offensive rebounding is actually very good. So, that's a push. The Warriors are good at forcing turnovers, and this is helping them from dropping to the very bottom of the league. Make no mistake about it, though, the Warriors will continue to be cellar dwellars until their offensive and defensive shooting efficiency improves.

Summary

I have shown that offensive and defensive shooting efficiency are by far the most important of the four factors, accounting for over 50% of wins alone. In comparison, offensive and defensive rebounding account for about 14% of wins. For reference, we can compare my results to Dean Oliver's estimates for the weight of each factor:

 Factor DeanO Regression Shooting 40% 54% Turnovers 25% 22% Rebounding 20% 15% Foul Rate 15% 10%

I like that the results of the regression are consistent with what Oliver found, and it is especially comforting since I haven't been able to actually track down his studies that show how he derived these weights. I assume he performed a similar analysis, but there may, of course, be other ways to arrive at the same conclusion. Lastly, it should be clear that the challenge for player valuation models is attributing credit for each of the factors to individual play. It is important to think about this team level analysis when you are considering models like Wins Produced, PER, Win Shares, etc.

14 thoughts on “Regressing Point Differential on The "Four Factors" (Part 2)”

1. EvanZ says:

thanks, jj!

1. Phil says:

Interesting stuff. How do you think this jives with box-score reliant player rating metrics, which don't take into account opponent efg%?

1. Phil says:

Checked out the analysis, interesting stuff. From what I can see, the only things that take into account opponent efg% are:

■ Each player loses 0.2 pts when opponent makes 2pt field goal
■ Same as above, except for 3pt made by opponent
■ Each player on defensive unit is credited 1/5 of 0.7 pts for an opponent missed field goal (note that all players on a floor unit will benefit from having a defensive stopper as a teammate)

For clarification, by "player" and "opponent" do you mean one particular player and the opposing player he is guarding makes a shot? Or does the entire team benefit whenever any opposing player misses a shot?

I can can see potential probelms with both approaches. Forcing misses is a result of both individual and team defense, so any black/white approach which credits players either based solely on their own forced misses, or all players equally, could be inherently faulty. The game just isn't played that way.

In other words, I do not beleive players deserve either 100% or 20% credit for forcing an opponent miss. It's somewhere in between, and largely situational. For example, a player deserves a lot more credit for forcing an opponent miss in an iso than forcing a miss in a screen/roll.

1. EvanZ says:

Phil, thanks for the comment. Right now, a missed field goal is credited to all players on the team that "forced" the miss. Future versions will mix team and individual (counterpart) credits/debits for shooting (and other stats, if it makes sense). I think using PBP data is an improvement over box score, but obviously, there are still going to be limitations to assigning credit/debit perfectly.

My hope, however, is that at the very least, by explicitly putting it in the model, it will be as transparent as possible. According to my analysis on the four factors, opponent eFG% is worth over 1/4 of the game, so obviously, defense needs to be in the model.

2. Philip says:

Evan, thanks for the clarification. I agree that the PBP data is an improvement, and look forward to your continued exploration into how individual player defense accounts for opponent efg%.

Glancing at team rankings, (and this is just an eyeball check, so I could be off base) it looks there is a strong correlation between 2-pt FG% and defensive efficiency, and between number of 3-pointers that opponents take and defensive efficiency. Interestingly, the correlation between 3-pt % and defensive efficiency seems to be a lot lower. So having a good defense is largely about forcing player to shoot 2s and to shoot them poorly; as long as they're not shooting many 3s, how well opponents shoot them is less significant.

Maybe players should be credited with forcing 2s rather than 3s. Food for thought.