Category Archives: Statistics

Stephen Curry is a great shooter, but he ain't *that* great

Don't worry, I've got mad love for ya, Steph.

I know you're already foaming at the mouth after reading the title, but that's the only way I thought I could you lure you in for what follows. Here's a more appropriate title, but one you might have never thought twice about clicking on:

A method for predicting FG% in basketball based on small number of shots

So here's a list of the top 20 players sorted by FG% on long twos (I'm defining a "long two" as 18-23 ft). The data was acquired using the PlayIndex+ tool over at basketball-reference. Here is the exact query, if you want to see the data for yourself.

Top 20 Players by Maximum-Likelihood Estimate (MLE)

MLE is simply a statistics shorthand for "taking the straight mean" (actually, it means a lot more than that, but for our purposes, I think that's good enough).

PLAYER TEAM MLE FGA
Stephen Curry GSW 68.30% 63
Steve Novak NYK 59.50% 37
Kurt Thomas POR 53.10% 64
Quincy Pondexter MEM 52.90% 34
Joakim Noah CHI 52.00% 25
Daequan Cook OKC 51.70% 29
Boris Diaw TOT 51.20% 41
Brandon Rush GSW 50.90% 57
Jonas Jerebko DET 50.80% 61
Kevin Garnett BOS 50.20% 237
Kris Humphries NJN 50.00% 36
Yi Jianlian DAL 50.00% 32
Chris Paul LAC 49.60% 127
Michael Beasley MIN 49.30% 69
Tim Duncan SAS 49.10% 114
Kevin Durant OKC 48.80% 121
Brandon Bass BOS 48.70% 119
Anthony Parker CLE 48.60% 70
Dirk Nowitzki DAL 48.60% 175
Charles Jenkins GSW 48.00% 127

The problem we face here is that the sample sizes vary so much between players and some are really small. Is Yi Jianlian really a 50% shooter from 18-23 ft? (If he is, somebody should probably sign him already.) What about Kris Humphries? Maybe not. Conversely, Dirk shot 48.6% on 175 FGA from this range, which is a larger sample size and about where we might expect him to be. Overall, the pool of 248 players with >25 FGA from 18-23 ft had a mean FG% of roughly 38%.

What can help us "shrink" these estimates closer to the mean is taking into account the variation between players in both FG% and FGA (i.e. sample size). One way to do this is to use a multi-level model (the other is a fully Bayesian approach, which Gelman says is roughly equivalent when there are a large number of "groups" such as this). If you're interested in this type of model, I highly recommend Gelman's "ARM" book.

In R, creating the model basically takes one step:

twos.mlm=lmer(cbind(FGM,FGA-FGM)~(1|PLAYER),family=binomial(),data=long_twos)

From that, I get a list of coefficients (called random effects) which can then be converted to our new (hopefully) more predictive FG%'s.

Before I show the new list of players and their estimates, take a look at how the spread of the histogram of FG%'s shrinks when going from the MLE to the multi-level model:


Now here's the top 50 according to their multi-level estimates:

Top 50 18-23 ft FG%

The column MULTI is the multi-level estimate.

PLAYER TEAM MLE MULTI FGA
Stephen Curry GSW 68.30% 46.9% 63
Kevin Garnett BOS 50.20% 45.5% 237
Dirk Nowitzki DAL 48.60% 43.7% 175
Chris Paul LAC 49.60% 43.3% 127
Kevin Durant OKC 48.80% 42.8% 121
Tim Duncan SAS 49.10% 42.8% 114
Brandon Bass BOS 48.70% 42.8% 119
Jose Calderon TOR 47.30% 42.6% 148
Charles Jenkins GSW 48.00% 42.6% 127
Kurt Thomas POR 53.10% 42.6% 64
Pau Gasol LAL 47.30% 42.3% 129
Steve Novak NYK 59.50% 42.3% 37
LaMarcus Aldridge POR 45.20% 42.2% 208
Drew Gooden MIL 45.60% 41.9% 158
Sebastian Telfair PHO 47.50% 41.9% 101
Jonas Jerebko DET 50.80% 41.8% 61
Anthony Morrow NJN 46.80% 41.7% 109
Steve Nash PHO 47.40% 41.7% 95
Michael Beasley MIN 49.30% 41.6% 69
Brandon Rush GSW 50.90% 41.6% 57
Jamal Crawford POR 45.10% 41.5% 144
Anthony Parker CLE 48.60% 41.4% 70
Ben Gordon DET 44.30% 41.3% 158
Nick Young TOT 44.20% 41.3% 163
Darren Collison IND 46.20% 41.1% 91
Chris Bosh MIA 44.10% 41.1% 152
Arron Afflalo DEN 47.20% 41.1% 72
Klay Thompson GSW 43.70% 41.0% 158
Boris Diaw TOT 51.20% 40.9% 41
David West IND 45.60% 40.9% 90
Quincy Pondexter MEM 52.90% 40.9% 34
Russell Westbrook OKC 43.50% 40.8% 147
D.J. White CHA 45.60% 40.7% 79
Marreese Speights MEM 44.70% 40.7% 94
Carlos Boozer CHI 45.70% 40.6% 70
David Lee GSW 44.40% 40.5% 90
Jarrett Jack NOH 44.40% 40.5% 90
Kris Humphries NJN 50.00% 40.4% 36
Steve Blake LAL 47.10% 40.4% 51
Daequan Cook OKC 51.70% 40.4% 29
Jared Dudley PHO 43.10% 40.2% 109
Yi Jianlian DAL 50.00% 40.2% 32
Jason Smith NOH 43.00% 40.2% 107
DeMarcus Cousins SAC 42.20% 40.2% 147
Joakim Noah CHI 52.00% 40.1% 25
Spencer Hawes PHI 46.00% 40.1% 50
Grant Hill PHO 43.60% 40.0% 78
Jason Terry DAL 42.70% 39.9% 96
Nate Robinson GSW 43.70% 39.9% 71
Ramon Sessions TOT 43.90% 39.9% 66

Now it's starting to make more sense. Here's the bottom 50:

Bottom 50

PLAYER TEAM MLE MULTI FGA
Glen Davis ORL 24.50% 33.3% 94
John Wall WAS 29.50% 33.7% 183
Corey Maggette CHA 26.50% 33.9% 98
Dorell Wright GSW 19.60% 34.2% 46
Andray Blatche WAS 24.20% 34.3% 66
Paul George IND 22.80% 34.3% 57
Markieff Morris PHO 22.20% 34.3% 54
Ivan Johnson ATL 20.80% 34.4% 48
Daniel Gibson CLE 21.30% 34.5% 47
Antawn Jamison CLE 31.30% 34.8% 166
DeMar DeRozan TOR 32.20% 34.9% 205
Carlos Delfino MIL 24.00% 35.0% 50
Paul Pierce BOS 28.90% 35.0% 90
Josh Howard UTA 28.80% 35.2% 80
John Lucas CHI 28.60% 35.2% 77
Byron Mullens CHA 32.80% 35.3% 204
Austin Daye DET 25.00% 35.3% 48
Leandro Barbosa TOT 30.80% 35.4% 104
Tracy McGrady ATL 29.00% 35.5% 69
C.J. Watson CHI 30.00% 35.6% 80
Metta World Peace LAL 22.90% 35.6% 35
Chauncey Billups LAC 20.70% 35.7% 29
Jeremy Pargo MEM 20.70% 35.7% 29
Marcus Camby TOT 25.60% 35.7% 43
Danilo Gallinari DEN 29.40% 35.7% 68
C.J. Miles UTA 29.20% 35.7% 65
Lamar Odom DAL 23.50% 35.8% 34
James Johnson TOR 30.90% 35.8% 81
Luc Mbah a Moute MIL 20.00% 35.9% 25
Andrew Goudelock LAL 24.20% 35.9% 33
Andre Iguodala PHI 32.80% 35.9% 122
J.J. Hickson TOT 28.30% 36.1% 46
Brandon Knight DET 32.30% 36.1% 93
Wesley Johnson MIN 32.30% 36.1% 93
Jodie Meeks PHI 25.00% 36.1% 32
Norris Cole MIA 32.20% 36.1% 90
Derrick Brown CHA 29.60% 36.1% 54
Monta Ellis TOT 34.40% 36.2% 195
Dominic McGuire GSW 27.50% 36.2% 40
Tyreke Evans SAC 33.30% 36.2% 120
Tyler Hansbrough IND 32.10% 36.3% 78
Travis Outlaw SAC 28.20% 36.4% 39
Courtney Lee HOU 32.50% 36.4% 83
Earl Clark ORL 27.80% 36.4% 36
Zach Randolph MEM 24.00% 36.4% 25
Ray Allen BOS 31.70% 36.5% 63
Danny Granger IND 33.70% 36.6% 95
Jordan Farmar NJN 30.20% 36.6% 43
Josh Smith ATL 35.80% 36.6% 316
Marvin Williams ATL 33.70% 36.6% 92

Conclusions

Now, do we have any evidence that Stephen Curry is closer to a 46.9% shooter from 18-23 ft rather than truly being a 68.3% shooter according to the 63 shots he took last year? Sure we do! Just go back to 2010-11 when he shot 49.1% on 214 FGA (still great!). Or his rookie season when he shot 47% on 232 FGA (also great!). Now that 46.9% makes a lot more sense, right? (By the way, the title of this post should make some more sense right about now, too.) Stephen Curry is a great shooter, he just ain't *that* great.

Just as Stephen Curry probably isn't a near-70% shooter on long 2's, Glen Davis is probably better than a 25% player on those shots. Indeed, in 2011, Davis shot 35.8% on 226 FGA. And he was a 38.9% in 2010, but on only 36 FGA. You know, it's important to point out that just because a player takes a small number of attempts doesn't necessarily mean he'll be at the top or bottom of lists like this. Sometimes the player will "randomly" fall in the middle, too.

So, hopefully, this made some sense to you. Next time you see an analyst talking about how a player lead the league with an astronomically high FG% on 12 shots of a certain type (say in the 4th quarter of games on the road on Sundays), think about this post. Heck, maybe e-mail the guy a link to it.

Ranking the All-Time Great Scoring Seasons in the 3-PT Era by Their Distance from Greatness

In my last post, I introduced the idea of a "convex hull" for the usage vs. efficiency relationship. If we agree that the upper-right edge of that relationship represents "greatness" (in terms of scoring), then it is a simple matter to quantify the relative greatness of all other player-seasons by measuring the distance from each point to that edge (see the plot below).

The distance between a data point and the edge of the USG/EFF relationship is a measurement of relative offensive value.

Continue reading

MAINS: Marginal Adjusted Inside Scoring (Or Why I Feel Pretty Good About Andrew "If He's Healthy" Bogut)

You may already be familiar with my marginal scoring metric, PSAMS (if not, see here). The basic idea with that metric is that I try to take into account the volume and type of shots (inside, mid-range, 3-pt, and free throws) for each player and calculate an "adjusted" scoring metric. For example, I give more credit to players who generate a high volume of inside shots and debit players who don't. I take into account the fact that some players are responsible for taking more than their fair share of mid-range shots (which tend to be lower efficiency), while others  take less, thus placing the burden of taking those bad shots on their teammates. And so on... Continue reading

Visualizing Team Units Using Hierarchical Clustering

UPDATE: Figured out how to do this using the corrplot package which add a lot more options. Check it out for GSW:

Click to enlarge.

This post owes its genesis to Alex Konkel who blogs at Sports Skeptic.  He asked if I could calculate the variance inflation factor (VIF) for the adjusted +/- regressions I've been doing. This would enable us to examine the collinearity between variables (i.e. players). We're actually trying to work out some kinks in that analysis, but in the meantime, it gave me an idea. Why not just calculate the correlation matrix for all players? Continue reading

Should Warriors Fans Second Guess the 2010 NBA Draft?

It's always a hot topic in Warriors land to debate whether we should have drafted Ekpe Udoh over Greg Monroe. Monroe and Udoh couldn't be more different in terms of box score stats (Monroe gets them, Udoh doesn't) — as Kevin Pelton pointed out just today (and you, dear reader of my blog, know I've been on the case for a long time already).

Box score stats are nice, and sometimes they lineup with team-level results, but the latter are what we really should care about the most. How does a player impact team-level results? To answer that question, advanced stat guys like myself look at team-level metrics, such as adjusted +/- (APM/RAPM) and my new A4PM (adjusted four factor +/-). With that in mind, today I wanted to take a look at how Udoh stacks up against some of the alternative draft choices the Warriors might have made. I'm focusing solely on the players that were talked about leading up to the draft as being possibilities for GSW at the #6, and the ones that have had enough playing time by now to have some confidence in their +/- data. These two criteria narrowed the list to 5 players: Greg Monroe (#7), Al-Farouq Aminu (#8), Paul George (#10), Cole Aldrich (#11), Ed Davis (#13), Patrick Patterson (#14). I also threw in Gordon Hayward (#9), since he was in the middle of that pack (but not ever rumored to go to GSW, as far as I can recall). Continue reading

Adjusted 3-Factor Point Guard Index

Nate Parham (@NateP_SBN) over at Golden State of Mind recently replied to my post on adjusted turnovers:

Curious: could you use these numbers along with Hollinger’s pure point rating to make an adjust pure point rating?

The reason I like PPR is that it effectively accomplishes what people miss when people talk about a point guard’s turnovers: how well he balances the harm of creating turnovers for the team with the benefit of creating a scoring opportunity for others.

Hollinger's PPR formula just uses assist and turnover rates (not adjusted for anything except pace). Continue reading

New Player Metric: 2.5-Year Adjusted Four Factor +/- (A4PM)

Blake meet Ekpe.

There are four factors of an offense or defense that define its efficiency: shooting percentage, turnover rate, offensive rebounding percentage, and getting to the foul line. Striving to control those factors leads to a more successful team. (Dean Oliver, “Basketball on Paper”)

A while back I did some work regressing the four factors (FF or 4F) on point differential at the *team* level (Part I and Part II). The result was the following equation:

 p.d. = 10.41 + 1.49eFG(own) - 1.63eFG(opp) + 0.187FTR(own) - 0.213 FTR(opp) -1.51TOR(own)+ 1.37TOR(opp) + 0.327ORR(own) -0.365ORR(opp)

where,

  • effective FG% (eFG):  eFG=(FG+0.5 *3PT)/FGA
  • foul rate (FTR):  FTR = FTA/FGA
  • turnover rate (TOR):  TOR=TOV / (FGA + 0.44 * FTA + TOV)
  • offensive rebounding rate (ORR):  ORR=ORB / (ORB + Opp DRB)

Since the time I wrote that post, I've thought it would be useful to translate the team level FF relationship to point differential (and winning) down to the player level. The way to do this (or, at least, one way) is to calculate adjusted versions of each of the four factors (i.e. APM-style), and then regress those adjusted factors onto player-level APM or RAPM. (It should be noted that Joe Sill calculated adjusted FF a few years ago, but those data and the articles have been taken down since he started working in the NBA.) Here, I'm using data from 2009-2010 through last Thursday's games to calculate my own version of RAPM and adjusted four factors for each player. Continue reading

2 1/2 Year Ridge Regressed Adjusted Assists

For my version of adjusted assists (not sure if anyone has done it before), I distinguish between assists to 2-pt and 3-pt field goals. To do this, I simply multiply the AST3 totals for a stint by 1.5. So, for example, if there is a 10-possession stint which has two assisted 2-pt field goals and one assisted 3-pt field goal, that would be 3.5 "equivalent" assists. The logic here is obviously that 3-pt buckets are worth more than 2-pt ones.

When looking at these numbers, you may at first be surprised to not see the top of the list jam-packed with a bunch of point guards (although Kidd, Nash, and DWill are all in the top 10). This is because *any* player on the floor that is a great spot-up shooter (especially 3-pt shooter) is going to naturally raise the number of assists dished out. Also, this metric (in theory, anyway) should be able to pick up on players who generate a lot of the mythical "hockey" assists (the pass before the pass before the shot). Using similar reasoning, one also realizes that point guards who have very high USG and look for their own shot (e.g. Rose and Westbrook) are going to appear lower in the rankings. Remember, adjusted assists is telling us how many assists were dished out at the *team level* while a player was on the floor. I have an idea in mind that might be able to adjust adjusted assists to account for these issues, so stay tuned for a future update, if I can work out the details. Continue reading

3-YR Adjusted Inside Shooting Efficiency (PPS) - Or YAFWMTL (Yet Another Former Warrior Makes The List)

In a previous post, I looked at how players could affect the team-level rate of inside shots and free throw attempts (you might remember former Warrior Jeremy Lin was high up on that list — whatever happened to that kid, anyway?). Here, I'll take a look at adjusted inside efficiency, using points per shot (PPS) as the metric. Since free throw efficiency is (presumably) not affected by the defense, there's no point in including it.

Briefly, I tracked for each stint — recall that a stint is defined as a series of possessions with the same offensive and defensive unit on the floor — during the last 3 seasons (2009-2011), the number of inside FGA and the total points scored on those shots. PPS is simply the number of points divided by the number of shots, which I then multiply by 100 to make it easier to do integer math (for geeky reasons, this makes things run faster in R). I then use the package glmnet to run ridge regression (alpha=0) using the default 10-fold cross-validation. Continue reading