*I posted the following on Golden State of Mind yesterday.*

The idea of "potential assists" is interesting to me. One of the weaknesses of box score stats is that assists are recorded, but not passes which would have been assists, if the ball had gone in the basket. Obviously, because assists are only awarded when a basket is scored, they are inherently dependent on the shooter. I could be the best point guard in the world, but if I'm surrounded by bad shooters, I might look worse, because I would get fewer assists. Similarly, I might be a really good passer, but don't get that many opportunities to set up my teammates, because I'm playing with Steve Nash or **Chris Paul**. My teammates score very efficiently when I get them the ball, but my assist rate still appears low. In theory, if "potential assists" were recorded, we would have some more information about passing, which is obviously an important part of the game.

There are no websites that I know of that record potential assists. So I decided to start tracking them on my own using Synergy. For this first scouting report, I'm focusing on spot-up jumpers, because these are the most straightforward plays to assess potential assists, and well, because spot-up plays are very important to winning. Basically, almost every jumpshot that is categorized as a spot-up play comes as a result of a potential assist. To be sure, not all of them do. Sometimes, a player catches the ball and then drives to the hoop or dribbles a couple of times and then takes a shot. I didn't track these. I also didn't track plays that resulted in fouls or turnovers, when I realized after some preliminary observations that these usually don't occur on pure jumpshots (i.e. those that would be potentially assisted). For reasons of sample size, I also limited my tracking to the six Warriors with the greatest number of shots, including Curry, Ellis, Wright, Lee, Williams, and Radmanovic. For each play, I recorded the game, quarter, a shot id, shooter and passer (by jersey number), type of shot (2 or 3), and whether the shot went in (obviously). Here's a few rows of data, so you get the idea:

GameID |
ShotID |
Q |
Shooter |
Make |
Type |
Passer |

LALGSW040611 | 1 | 1 | 1 | 0 | 3 | 8 |

LALGSW040611 | 2 | 1 | 8 | 0 | 3 | 1 |

LALGSW040611 | 3 | 1 | 30 | 0 | 3 | 20 |

LALGSW040611 | 4 | 1 | 30 | 0 | 3 | 8 |

LALGSW040611 | 5 | 1 | 8 | 0 | 2 | 10 |

LALGSW040611 | 6 | 1 | 1 | 0 | 3 | 10 |

LALGSW040611 | 7 | 1 | 8 | 0 | 3 | 30 |

LALGSW040611 | 8 | 2 | 8 | 1 | 3 | 23 |

LALGSW040611 | 9 | 2 | 1 | 0 | 3 | 8 |

LALGSW040611 | 10 | 3 | 1 | 0 | 3 | 10 |

LALGSW040611 | 11 | 3 | 8 | 1 | 3 | 1 |

LALGSW040611 | 12 | 3 | 8 | 0 | 3 | 30 |

LALGSW040611 | 13 | 3 | 1 | 1 | 3 | 30 |

In total, I was able to track 1,093 shots. One-thousand and ninety-three shots. Yes, that's quite a number.

Now we get to the stats. First up, I want to show you the shooting efficiency of each of the six tracked players on potentially assisted spot-up plays:

SHOOTER |
SHOTS |
POINTS |
PPS |

Curry |
169 | 234 | 1.38 |

Williams |
163 | 215 | 1.32 |

Ellis |
152 | 175 | 1.15 |

Wright |
393 | 450 | 1.15 |

Radmanovic |
98 | 112 | 1.14 |

Lee |
118 | 98 | 0.83 |

Here, the number of shots includes 2-pt and 3-pt shots that were potentially assisted. PPS is simply the number of points scored divided by the number of shots. (I decided to use PPS as opposed to TS% or eFG%, because it allows easier comparison to Synergy stats, and makes some of the upcoming derived stats easier to calculate.) Not surprisingly, Curry and Williams were the most efficient (by quite a lot). Lee, because he takes so many two point shots and virtually no 3-pt shots, was the least efficient spot-up shooter. So far, so good. Now, let's look at something that you haven't seen before, which we'll call "passing efficiency":

PASSER |
SHOTS |
POINTS |
PPS |

Wright |
127 | 174 | 1.37 |

Ellis |
271 | 352 | 1.30 |

Lee |
185 | 234 | 1.26 |

Udoh |
26 | 30 | 1.15 |

Biedrins |
35 | 39 | 1.11 |

Law |
38 | 42 | 1.11 |

Williams |
71 | 76 | 1.07 |

Lin |
28 | 29 | 1.04 |

Curry |
226 | 222 | 0.98 |

Radmanovic |
43 | 39 | 0.91 |

I know this is where the shin is going to hit the fat. **Dorell Wright** was the most efficient passer, as the PPS off his passes was 1.37. Ellis was just behind at 1.30, followed by Lee (1.26). The big, perhaps shocking, surprise here is that Curry comes in close to the bottom with a PPS of only 0.98. So, what's the deal? Is Curry really such a bad passer? Remember, folks. I'm a so-called Curry fanboy, so it's not like this is the outcome I was looking for or expecting.

Time for a little more parsing. Here's a table that gives the efficiency of each passer-shooter tandem:

SHOOTER |
|||||||||||||

PASSER |
Curry |
RANK |
Ellis |
RANK |
Lee |
RANK |
Radmanovic |
RANK |
Williams |
RANK |
Wright |
RANK |
RATIO |

Wright |
1.69 | 1 | 1.00 | 8 | 0.57 | 6 | 1.10 | 7 | 1.89 | 2 | NA | 1.12 | |

Ellis |
1.42 | 4 | NA | 0.77 | 4 | 1.23 | 5 | 1.46 | 3 | 1.26 | 3 | 1.09 | |

Lee |
1.69 | 2 | 1.15 | 6 | NA | 2.14 | 1 | 1.06 | 6 | 1.12 | 5 | 1.05 | |

Udoh |
1.50 | 3 | 2.25 | 1 | 0.00 | 7 | 1.20 | 6 | 2.00 | 1 | 0.86 | 8 | 1.03 |

Biedrins |
0.60 | 6 | 1.80 | 2 | 0.00 | 8 | 1.50 | 2 | 1.00 | 7 | 1.40 | 2 | 0.97 |

Law |
0.00 | 8 | 1.50 | 5 | 0.80 | 3 | 0.90 | 8 | 0.86 | 8 | 1.56 | 1 | 0.97 |

Williams |
0.73 | 5 | 1.08 | 7 | 0.75 | 5 | 1.25 | 4 | NA | 1.22 | 4 | 0.93 | |

Curry |
NA | 0.87 | 9 | 1.21 | 2 | 0.78 | 9 | 1.15 | 5 | 0.92 | 7 | 0.89 | |

Lin |
NA | 1.50 | 4 | 2.00 | 1 | 1.29 | 3 | 0.67 | 9 | 1.00 | 6 | 0.87 | |

Radmanovic |
0.38 | 7 | 1.80 | 3 | 0.00 | 9 | NA | 1.31 | 4 | 0.60 | 9 | 0.74 |

Just to be clear how to read the data, for example, Curry's PPS when potentially assisted by Wright was 1.69 (the upper left corner of the table). Curry was most efficient when receiving passes from Wright, thus, ranking Wright first (the column RANK to the right of each shooter). Here you can also see that Wright was more efficient when potentially assisted by Ellis (1.26 PPS) compared to Lee (1.12) or Curry (0.92).

The careful reader may have noticed the "RATIO" column at the right side of the table. To explain this new term, I need to show you the pass (or shot) distribution:

SHOOTER |
||||||||||

PASSER |
Curry |
Ellis |
Lee |
Radmanovic |
Williams |
Wright |
SHOTS |
XPPS |
PPS |
RATIO |

Wright |
42 | 34 | 14 | 10 | 27 | 0 | 127 | 1.23 | 1.37 | 1.12 |

Ellis |
59 | 0 | 26 | 22 | 41 | 123 | 271 | 1.19 | 1.30 | 1.09 |

Lee |
35 | 34 | 0 | 7 | 16 | 93 | 185 | 1.21 | 1.26 | 1.05 |

Udoh |
2 | 4 | 5 | 5 | 3 | 7 | 26 | 1.12 | 1.15 | 1.03 |

Biedrins |
5 | 5 | 5 | 2 | 3 | 15 | 35 | 1.15 | 1.11 | 0.97 |

Law |
1 | 6 | 5 | 10 | 7 | 9 | 38 | 1.14 | 1.11 | 0.97 |

Williams |
11 | 13 | 8 | 12 | 0 | 27 | 71 | 1.15 | 1.07 | 0.93 |

Curry |
0 | 38 | 48 | 18 | 34 | 88 | 226 | 1.11 | 0.98 | 0.89 |

Lin |
0 | 2 | 1 | 7 | 9 | 9 | 28 | 1.19 | 1.04 | 0.87 |

Radmanovic |
8 | 5 | 4 | 0 | 16 | 10 | 43 | 1.23 | 0.91 | 0.74 |

Curry was potentially assisted by Wright 42 times, Ellis by Wright 34 times, and so on. If we take the PPS from the first table, we can then calculate an "expected PPS", which I call XPPS. Here's an example calculation using Wright:

1.23 = 42*1.38 (Curry) + 34*1.15 (Ellis) + 14*0.83 (Lee) + 10*1.14 (Radmanovic) + 27*1.32 (Williams)

The actual PPS on shots potentially assisted by Wright was 1.37. The ratio of (actual) PPS to XPPS therefore represents a measure of "normalized" passing efficiency, that takes into account the particular distribution of passes by each player. In theory, this is a more fair way to compare players. For example, Wright obviously benefits from being able to pass to Curry (who is very efficient), but Curry can't pass it to himself (oh, one wishes). You can see this by looking at the XPPS. Even given that Curry has a low XPPS, his RATIO shows that, for whatever reason, the actual PPS off of Curry passes was lower than might be expected.

Here's where we get a little more sophisticated. We want to know if any of these data are statistically significant. In other words, are these numbers real or could they result from chance alone? First, I ran a linear regression with **Points** as the dependent variable and **Shooter** as a single predictor (in other words, ignoring the passer):

```
Call:
lm(formula = Points ~ as.factor(Shooter), data = subset(GSW2011))
Residuals:
Min 1Q Median 3Q Max
-1.385 -1.145 -1.143 1.681 1.857
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Curry) 1.3846 0.1091 12.692 < 2e-16 ***
as.factor(Shooter)Ellis -0.2333 0.1585 -1.472 0.14142
as.factor(Shooter)Lee -0.5541 0.1701 -3.257 0.00116 **
as.factor(Shooter)Radmanovic -0.2418 0.1801 -1.343 0.17968
as.factor(Shooter)Williams -0.0656 0.1557 -0.421 0.67360
as.factor(Shooter)Wright -0.2396 0.1305 -1.836 0.06656 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.418 on 1087 degrees of freedom
Multiple R-squared: 0.01146, Adjusted R-squared: 0.006915
F-statistic: 2.521 on 5 and 1087 DF, p-value: 0.028
```

Here, Curry is treated as the baseline (1.3846 PPS) with the other players being the comparisons (or contrasts). It turns out that only Lee was found to be statistically different from Curry (p<0.01). The negative coefficient means that Lee's PPS was found to be 0.83 = 1.3846 - 0.5541 (Curry - Lee). Note that 0.83 is the PPS value given in the first table. Wright's PPS was just above the level usually considered statistically significant, although some people would call it a "trend". As Warriors fans, we probably can all agree that Curry is a better spot-up shooter than Ellis, but technically speaking, these data don't "prove" that is the case. Maybe 1.5 or 2 years of data would provide a big enough sample size to make stronger claims.

Let's look at passing now. I'm doing the same regression, except this time using **Passer** as the predictor:

Call: lm(formula = Points ~ as.factor(Passer), data = subset(GSW2011)) Residuals: Min 1Q Median 3Q Max -1.3721 -1.2649 -0.9956 1.7232 2.0930 Coefficients: Estimate Std. Error t value Pr(>|t|) (Curry) 0.99558 0.09447 10.538 <2e-16 *** as.factor(Passer)Ellis 0.28118 0.12794 2.198 0.0282 * as.factor(Passer)Lee 0.26929 0.14081 1.912 0.0561 . as.factor(Passer)Other 0.11752 0.14468 0.812 0.4168 as.factor(Passer)Radmanovic -0.08860 0.23629 -0.375 0.7078 as.factor(Passer)Williams 0.07485 0.19322 0.387 0.6986 as.factor(Passer)Wright 0.37652 0.15672 2.402 0.0165 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.42 on 1086 degrees of freedom Multiple R-squared: 0.009539, Adjusted R-squared: 0.004066 F-statistic: 1.743 on 6 and 1086 DF, p-value: 0.1078

Again, Curry is the baseline comparison. Notice the PPS is much lower this time (0.996). I should note here that I've lumped all other players not listed into a group called "Other". Ellis and Wright are the two players here who were found to be statistically different from Curry, each with a positive coefficient (which should be added to Curry's). Lee comes very close to significance, so we'll call that a trend.

Ok, one more regression. Now, we're going to include both **Shooter** and **Passer** as factors in the analysis:

Call: lm(formula = Points ~ as.factor(Passer) + as.factor(Shooter), data = subset(GSW2011)) Residuals: Min 1Q Median 3Q Max -1.487 -1.221 -1.027 1.688 2.168 Coefficients: Estimate Std. Error t value Pr(>|t|) (Curry) 1.203563 0.156291 7.701 3.04e-14 *** as.factor(Passer)Ellis 0.210800 0.132681 1.589 0.1124 as.factor(Passer)Lee 0.187243 0.145679 1.285 0.1990 as.factor(Passer)Other 0.069063 0.146181 0.472 0.6367 as.factor(Passer)Radmanovic -0.194948 0.239388 -0.814 0.4156 as.factor(Passer)Williams 0.046394 0.195319 0.238 0.8123 as.factor(Passer)Wright 0.278321 0.164856 1.688 0.0916 . as.factor(Shooter)Ellis -0.169417 0.163773 -1.034 0.3012 as.factor(Shooter)Lee -0.459595 0.177621 -2.588 0.0098 ** as.factor(Shooter)Radmanovic -0.175921 0.185026 -0.951 0.3419 as.factor(Shooter)Williams 0.004799 0.159190 0.030 0.9760 as.factor(Shooter)Wright -0.176175 0.136564 -1.290 0.1973 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.418 on 1081 degrees of freedom Multiple R-squared: 0.01773, Adjusted R-squared: 0.007735 F-statistic: 1.774 on 11 and 1081 DF, p-value: 0.05399

The only statistically significant result here is that Lee has a negative effect as a shooter. Wright's positive effect as a passer appears to be a trend. With a p-value of 0.11, Ellis comes relatively close to being labeled a trend. In both these cases, a larger sample size might yield statistically significant results.

## **Conclusions**

When we take into account the recipient of potential assists, the (marginal) effects of the passer do not appear to be statistically significant. Now, that doesn't mean all passers are equivalent, or even that all Warriors players are equivalent. After watching a lot of video over the past week, I got the distinct impression that Ellis creates a lot of open shots with his ability to drive, draw defenders, and kick out. Conversely, while I believe Curry does have that ability, he doesn't use it quite as often. Also, it seems to me that Curry's passing can be improved, in terms of tightening up the accuracy. In particular, I think this is an issue between Curry and Wright. Wright appears to be most efficient when receiving the ball directly in front of his body, as opposed to his left or right. I think the connection from Curry to Wright would be more efficient, if Curry could more consistently center his passes on Dorell. This is also an issue when Dorell shoots after the ball is swung to him on the perimeter, and he has to turn his body to catch it and then turn back towards the basket to shoot. Dorell is not nearly as quick in getting his shot off as Reggie or Curry, so I believe his efficiency is more sensitive to the pass quality. Originally, I was going to post some scouting videos showing these things, but I don't want to be accused of cherry picking. Also, I didn't quantify any of this, so I could be wrong. Maybe in the future I'll undertake a more careful and quantitative analysis in this regard. I would suggest, however, that next season (whenever it plays out), you look for yourself.

Of course, there's an even bigger issue here to discuss. The fact of the matter is that our PG is our best shooter, yet most of us want the ball in his hands more so that he can setup his teammates. Should these data make us reconsider whether the Curry/Ellis backcourt should actually be Ellis/Curry? Think about it. These data suggest that the main reasons for Curry's low passing efficiency is simply due to the fact that he's passing to teammates who are worse shooters than he is. Especially Lee. I think that's something that has to be looked at. To be sure, Lee has to be part of the offense. It would be nice if he could develop a three-point shot, but if it hasn't happened by now, that's probably just wishful thinking on my part. Of course, there are other types of plays that are potentially assisted. Maybe Curry is much more efficient at setting up those plays? That's certainly something I'd like to investigate further. Part of me thinks it might really be worth experimenting with Monta at PG full time, with Mark Jackson as a mentor. Of course, another solution is simply to surround Curry with better shooters (Reggie, maybe Klay). At any rate, I'm glad I undertook this little project. It brought up some interesting issues and raises some questions for further research.

Excellent work!

Thanks, Daniel. Oddly, your comment went to my spam box. Glad I checked.

It shows a couple of text lines in the R-code box. Looks really weird

Thanks, Jerry. I fixed it.

Hey Evan - Neat results, but I don't think a linear regression is completely appropriate here. A couple of ideas - you could record the four different outcomes (missed two, made two, missed three, made three) as your dependent variable instead of points scored and run a multinomial regression using scorer and passer as predictors. R has it in the mlogit library. It would tell you if the probability of those four things depends on who the passer and/or shooter were. Or you could run a GLM where your observations are makes out of total shots for each shooter-passer pair; you would probably also include if the shot was a 2 or 3 as a predictor. In R it would look something like glm(makes/shots ~ shooter+passer+shotvalue, weight=shots, family=binomial). That would tell you the likelihood of a shot being made depending on the shooter, passer, and if it's a 2 or 3 (via the logit link).

Thanks, Alex. I actually started out doing logistic regression (using glm in R), but the sample size for 2's is really small, except for Lee's. I tried various combinations of predictors, including Type, but in the end, I think what I have here was the most interesting (meaning there were other less interesting results). Also, what we - or at least, I - care about is the overall efficiency, not simply the 2-pt or 3-pt efficiency. The Lee/Curry 2-pt efficiency may be solid, but that appears to hurt the overall efficiency, at least, for spot-up attempts.

Maybe I'll take another look, though. I'll update the post if I find something.

Oh, before I forget, isn't what I've done essentially equivalent to two-way ANOVA?

It is, but ANOVA probably isn't suited for points scored if it can only be 0, 2, or 3; that's very coarse whereas ANOVA assumes a normal distribution. If you had enough data that you could use PPS as your DV you could probably run the linear regression/ANOVA on that. I'm mostly concerned because you shouldn't get a non-significant regression (your last one has a p>.05) if there are significant effects within (Lee as a shooter is fairly low). The errors on your coefficients are likely wrong.

The multinomial or glm regressions should account for the small cells for the most part; you'll just have bigger errors on the two point shots (or there are fancier things you can do). Then if you're interested in efficiency you could just combine the model estimates to get an EV. Figure out the probability for making a two pointer given a certain shooter and passer, multiply by two, then add to the probability of making a three given a certain shooter and passer times three.

Thanks, that's a good point about the distribution. Looks like I need to read up on multinomial regression and how to do it in R.