Regressing Point Differential on The "Four Factors" (Part 1)

There are four factors of an offense or defense that define its efficiency: shooting percentage, turnover rate, offensive rebounding percentage, and getting to the foul line. Striving to control those factors leads to a more successful team. (Dean Oliver, "Basketball on Paper")

How well do these four factors predict point differential (and thus, winning)? How important are each of the factors relative to the others? The first question is the subject of today's post. The second question will be covered in Part 2.

How well do these four factors predict point differential? To answer this question, we want to construct a model. The inputs (independent variables) to the model are the four factors (well, eight factors, since we should consider offense and defense separately), and the output (dependent variable) is point differential (p.d.). The (linear) model looks like this:

p.d. = beta_0 + beta_1 * eFG(own) + beta_2 * eFG(opp) + beta_3 * FTR(own) + beta_4 * FTR(opp) \+beta_5 * TOR(own) + beta_6 * TOR(opp) + beta_7 * ORR(own) + beta_8 * ORR(opp) + varepsilon

Here, beta_0 is a constant term and varepsilon is an error term. The beta_1..beta_8 are model coefficients that will be determined by performing a multiple linear regression using the (free!) statistics software package R. The four factors are defined as follows:

  • effective FG% (eFG): eFG=(FG+0.5 *3PT)/FGA
  • foul rate (FTR): FTR = FTA/FGA
  • turnover rate (TOR): TOR=TOV / (FGA + 0.44 * FTA + TOV)
  • offensive rebounding rate (ORR): ORR=ORB / (ORB + Opp DRB)

It is important to note that the opponent ORR (also called ORB%) is simply the defensive rebounding rate of the team (DRR or DREB%). In other words, your DRR plus the opponent ORR should add to 1 (100%). And vice-versa.

The training set used to calculate the regression contains four factors data for each team over the past four seasons (2006-2009) — available from hoopdata.com. The current season is then used to test the model. Here are the estimated model coefficients (standard error):

beta_0 = 10.41 (3.69) quad beta_1=1.49 (0.039) quad beta_2=-1.63 (0.049) quad beta_3=0.187 (0.024) quad beta_4=-0.213 (0.021) quad beta_5 = -1.51 (0.074) quad beta_6 = 1.37 (0.072) quad beta_7 = 0.327 (0.029) quad beta_8 = -0.365 (0.041) quad varepsilon = 0.664

The R^2 for the model is 0.985, and all coefficients were found to be statistically significant (meaning they contribute to outcome — how much so, we will examine in Part 2).

Note that positive coefficients (own eFG%, own FTR, opp TOR, own ORR) mean that terms add to point differential, while negative coefficients (opp eFG%, opp FTR, own TOR, opp ORR or own DRR) subtract from point differential.

Now that we have determined the model coefficients, we can test the model using the four factors stats for the current season. Here is a summary in table form:

eFG% FTR TOR ORR
Team OEFF DEFF P.D. Prediction Own Opp Diff Own Opp Diff Own Opp Diff Own Opp Diff
BOS 107.2 96.3 10.9 12.4 54.27 46.66 7.61 31.4 32.7 -1.39 14.46 15.5 -1.04 21.33 23.09 -1.76
MIA 109.1 97.2 11.9 11.5 51.67 46.07 5.6 38 31.3 6.67 12.79 13.5 -0.71 24.75 24.47 0.28
SAS 110.3 99.7 10.6 8.9 52.34 49.4 2.94 31.9 24.5 7.37 12.93 14.75 -1.82 26.35 25.7 0.65
LAL 109.4 101.8 7.6 7.4 50.71 47.51 3.2 29.8 25.2 4.69 12.55 13.31 -0.76 29.99 29.75 0.24
DAL 106.9 99.6 7.3 7.2 52.13 47.39 4.74 30.5 27.7 2.73 13.87 13.59 0.28 23.61 25.31 -1.7
ORL 104.5 98.7 5.8 6.7 52.3 48.22 4.08 33.8 29.4 4.38 15.06 13.94 1.12 25.28 22.14 3.14
CHI 102.5 98.2 4.3 6.2 49.55 47.11 2.44 30.9 29.9 1 14.37 14.86 -0.49 29.24 25.3 3.94
UTH 106.4 103.2 3.2 3.9 50.24 47.66 2.58 31.7 36.7 -5.07 12.79 14.46 -1.67 25.27 30.07 -4.8
PHI 102.6 101.8 0.8 2.8 49.3 47.03 2.27 29.7 35.2 -5.55 13.2 13.33 -0.13 24 24.83 -0.83
NOR 101.3 99.5 1.8 2.7 49.05 48.17 0.88 31.5 29.7 1.75 13.6 14.31 -0.71 21.69 22.97 -1.28
ATL 105.6 103.6 2 2.1 50.95 49 1.95 29 28.3 0.71 13.22 12.7 0.52 23.82 25.63 -1.81
NYK 109.5 106 3.5 1.5 52.38 50.92 1.46 33 31.7 1.31 13.73 13.67 0.06 25.17 27.39 -2.22
DEN 108 105.4 2.6 1.1 51.35 50.32 1.03 39 30.9 8.16 13.53 12.41 1.12 23.88 25.42 -1.54
IND 100.9 100.5 0.4 0.67 49.84 46.47 3.37 26.3 34 -7.66 14.62 12.97 1.65 22.66 25.8 -3.14
MIL 97.6 99.7 -2.1 -0.47 44.49 48.71 -4.22 34.3 32.1 2.14 12.95 15.24 -2.29 27.33 21.93 5.4
MEM 102.4 104.5 -2.1 -0.81 49.35 50.64 -1.29 28.3 30.2 -1.94 13.87 15.83 -1.96 27.22 30.46 -3.24
OKC 105.9 104.2 1.7 -0.81 48 50.05 -2.05 38.3 29.8 8.54 13.13 13.91 -0.78 25.69 27.47 -1.78
PHO 109.5 110 -0.5 -1.1 52.96 53.33 -0.37 32.6 27.5 5.03 13.5 14.01 -0.51 25.8 31.25 -5.45
HOU 106.6 106.9 -0.3 -1.2 50.53 50.29 0.24 33.4 30.5 2.85 13.6 12.38 1.22 25.85 27.17 -1.32
POR 102.4 103.5 -1.1 -2.3 46.58 50.51 -3.93 28 35.2 -7.15 13.2 15.91 -2.71 31.23 27.43 3.8
CHA 100.1 103 -2.9 -2.4 49.34 49.53 -0.19 32.7 29.7 3.03 16.34 13.83 2.51 26.9 24.21 2.69
TOR 104.4 108.2 -3.8 -3.6 49.64 52.58 -2.94 32.1 32.1 0 14.38 14.23 0.15 30.1 25.89 4.21
GSW 102.8 108.9 -6.1 -4.7 49.77 51.44 -1.67 24.3 36.9 -12.59 14.4 15.07 -0.67 29.66 30.47 -0.81
DET 101.5 108 -6.5 -5.7 48.18 51.53 -3.35 29.5 30 -0.48 13.14 13.2 -0.06 25.94 27.87 -1.93
LAC 99.7 107.2 -7.5 -6.3 48.44 51.01 -2.57 34 34.6 -0.59 15.54 13.18 2.36 28.75 25 3.75
SAC 99.6 107.3 -7.7 -6.7 46.84 50.98 -4.14 29.9 35.1 -5.24 13.82 13.47 0.35 29.93 26.54 3.39
NJN 99.4 105.2 -5.8 -7.0 46.79 49.22 -2.43 31.4 34.3 -2.9 13.7 11.6 2.1 24.62 25 -0.38
MIN 100.5 108.4 -7.9 -8.0 47.68 51.15 -3.47 28.5 36 -7.55 15.28 13.04 2.24 30.86 24.88 5.98
WAS 100.2 109.2 -9 -9.5 47.9 52.14 -4.24 29.4 33.1 -3.79 14.87 14.51 0.36 28.59 32.61 -4.02
CLE 97.7 108.5 -10.8 -10.0 46.49 53.04 -6.55 30.1 28.2 1.92 12.72 12.73 -0.01 21.95 23.65 -1.7

Here is a plot showing the relationship between observed point differential and p.d. as predicted by the model:

Prediction of point differential using four factors linear model.

So, there  you have it. The four factors (eFG%, TOR, FTR, & ORR) explains about 96% of point differential. Next time, we'll explore the relative weights of each term in the model, which will enable us to understand how the factors truly contribute to winning. Stay tuned...

7 thoughts on “Regressing Point Differential on The "Four Factors" (Part 1)”

  1. Did you just do a linear regression along the lines of "pd ~ eFG(own) - eFG(opp) + FTR(own) - FTR(opp) − TOR(own) + TOR(opp) + ORR(own) − ORR(opp)"?  I'm trying to reproduce your results, but in R, I keep getting what (likely) amount to multicolinearity errors.  I've just been using the basic lm() function for my initial attempts, but no go.  Did you run into this problem?  Did you find the coefficients in a different manner?

Leave a Reply