There are four factors of an offense or defense that define its efficiency: shooting percentage, turnover rate, offensive rebounding percentage, and getting to the foul line. Striving to control those factors leads to a more successful team. (Dean Oliver, "Basketball on Paper")
How well do these four factors predict point differential (and thus, winning)? How important are each of the factors relative to the others? The first question was the subject of Part 1 — now would be a convenient time to read Part 1, if you haven't already done so (don't worry, Part 2 will still be here when you get back, thanks to the magic of the interwebs). Today we will address the second question.
How important are each of the factors relative to the others? In Part 1, we found the following model for predicting point differential (p.d.) as a function of the four factors (well, eight factors, including offense+defense):
- effective FG% (eFG):
- foul rate (FTR):
- turnover rate (TOR):
- offensive rebounding rate (ORR):
Recall that positive coefficients (own eFG%, own FTR, opp TOR, own ORR) mean that terms add to point differential, while negative coefficients (opp eFG%, opp FTR, own TOR, opp ORR or own DRR) subtract from point differential.
Upon inspection of the model, one is, perhaps, initially tempted to conclude that the most important terms are the ones with the largest coefficients (in terms of absolute value) — eFG% and TOR. The problem with that logic is twofold: 1) It should be obvious that the means for each stat vary over a wide range (i.e. eFG% is typically around 50%, whereas TOR~13%). Therefore, even though the coefficients for eFG% and TOR are similar, eFG% is larger overall, and would appear to dominate. 2) The variation for each stat may vary. In other words, even if a parameter appears to be a large contributor based on its coefficient and mean, in practice, if there is little variation (i.e. between teams), it will not have a large effect on winning.
Fortunately, there is a straightforward way to deal with both issues and get at the truth. Specifically, we can use the model, itself, to calculate the variation in total wins due to a normalized change in each parameter. Here's how it works. First, I will temporarily take over David Stern's role as NBA Commish (thank you, thank you), and create a new franchise in Las Vegas (VEG — Vegas, baby!). Next, we will magically skip forward to next pre-season — yes, there was an expansion draft, no, VEG did not get LeBron James, although I will not rule out James having taken his talents to Vegas on several occasions. Before the season begins, we would like to predict how many wins VEG might have. How do we do this?
Oh, right, the model! Let's start out by assuming (optimistically) that our new franchise is average in all eight of the four factors (you know what I mean). How many wins would such a team produce (if you've already guessed around 41, eat a cookie or something)? Take a look at the table below. I've calculated the NBA average value and standard deviation (STD) for each category. Next, I varied each parameter by one standard deviation (in a direction that increases wins for that category), and used the model to predict point differential (P.D.) and wins (uh, Wins). Wins are related to point differential by the following formula (see here for explanation):
As expected, if VEG is totally average across-the-board (case VEG0), the model predicts 41.2 wins (no surprise, eh, that's just about 50%). And if you're complaining that the prediction is not exactly 41.0 wins, well, get a life. (And curl up with a good statistics book that can tell you about the nature of error and uncertainty in model predictions.)
Next, we change eFG%(own) by +1 STD from 49.6 to 51.9 (VEG1). The result is that VEG is now predicted to win 49.9 games. Wow! That's an increase of almost 8 wins, just by varying eFG% by 1 STD. What happens when we do the same thing to the other categories? Ok, alright. You get it by now...just look at the table.
Having varied each factor by +(-) 1 STD, we can now rank the factors in terms of wins produced over average. We see that the ranking goes:
The last category (%) takes the wins produced above average (Delta) and divides that amount by the sum of the Deltas for each case. This is what we were looking for to begin with: the relative weight of each factor. Note that shooting efficiency (producing it and defending against it) accounts for about 52% of the extra wins. Shooting efficiency is followed by turnover ratio, rebounding, and foul rate.
To bring this back to reality a bit, now let's look at the current season and how teams at the top and bottom of the league are doing with respect to the four factors. The top and bottom rows represent hypothetical teams that are +1 or -1 STD relative to the mean in all 8 factors.
I've highlighted in green (red) the values that are above (below) 1 STD from the mean (in a direction that produces more or less wins, respectively). Lastly, since this is a Warriors-centric blog, let's take a (sad and unfortunate) look at my favorite team with respect to the four factors:
Interestingly, the Warriors are not terrible in shooting, although they are just below league average in offensive efficiency, and well below in defensive efficiency. The Warriors are absolutely terrible in FTR. In fact, they are about 2 STD below the league average in going to the line. Surprisingly, considering the off-season acquisition of Lee and the return of Biedrins, the defensive rebounding is really bad. However, the offensive rebounding is actually very good. So, that's a push. The Warriors are good at forcing turnovers, and this is helping them from dropping to the very bottom of the league. Make no mistake about it, though, the Warriors will continue to be cellar dwellars until their offensive and defensive shooting efficiency improves.
I have shown that offensive and defensive shooting efficiency are by far the most important of the four factors, accounting for over 50% of wins alone. In comparison, offensive and defensive rebounding account for about 14% of wins. For reference, we can compare my results to Dean Oliver's estimates for the weight of each factor:
I like that the results of the regression are consistent with what Oliver found, and it is especially comforting since I haven't been able to actually track down his studies that show how he derived these weights. I assume he performed a similar analysis, but there may, of course, be other ways to arrive at the same conclusion. Lastly, it should be clear that the challenge for player valuation models is attributing credit for each of the factors to individual play. It is important to think about this team level analysis when you are considering models like Wins Produced, PER, Win Shares, etc.