NBA Combine Measurement Similarities

I'm sick, so I data.

With the annual NBA Draft Combine having completed the anthropometric and athletic testing portion, it's a good time to update the similarity study I did a few years ago here. To summarize, I take all the testing categories available from DraftExpress (from 2009 through 2014) and use a couple of R packages (ape and cluster) to spit out the similarities between players. The result is a circular dendrogram. The closer two players are on the dendrogram, the more similar they are in terms of the combine results.NBA Draft Combine similarities 2009-2014.

A few examples of closest comps for fun:

  • Garry Harris and Austin Rivers
  • Thanasis Ante... and Wesley Johnson
  • James Young and Xavier Henry
  • Aaron Craft and Jimmer Fredette
  • Jahii Carson and Peyton Siva
  • Jordan McRae and Jeremy Lab
  • Noah Vonleh and Derrick Favors

See if you can find some others. It's not perfect, of course. But it's fun. Should entertain you for at least several minutes. Enjoy! Pass it around on the interwebz if you like.


NBA Draft Combine similarities 2009-2014.

A History of Hating Harrison Barnes

I think Twitter is amazing. It is also somewhat, perhaps mostly, responsible for the diminution in frequency of my long-form blog posts here and at GSoM over the last couple years (also I got really freaking busy with the nbawowy stuff). It's just so easy on twitter to communicate your thoughts in real-time, that I often feel like I've already said everything I want to say, and it obviates the need for more than 140 characters at a time that the old-fashioned blog platform originally provided.

If you are reading this, there is a good chance you follow me on twitter, and if you follow me on twitter you probably have heard me make a disparaging remark or two about the play of a certain Golden State Warrior who arrived by way of North Carolina and Iowa. I'm referring, of course, to Harrison Bryce Jordan Barnes. They say don't hate the player, hate the game. Well, I've tried my best to hate the game, but I am continually accused of hating the player regardless.

As a fun exercise for myself, and to stir the passions of Barnes fanboys everywhere, I wanted to go through my history of tweeting about Barnes (I now have over 706 tweets with "Barnes" as a search term, although some of those could be about Matt Barnes!) to see how my "hate" for this player came to be. Think of this post as the origin myth for the most rampant and prolific Barnes "hater" on all of twitter (if you know of anyone who "hates" Barnes more than myself, let me know in the comments or on twitter!). So without any further adieu...let's a do this.

I didn't think Barnes would be there at 7 leading up to the draft.


And cue the draft, Barnes falls to 7. I'm apparently fine with it.


Although in my heart and head I wanted us to take John Henson.

(since 2011)


(and Nicholson!)


(and I knew we would never have the balls to take him)


(oh, the wildcard!)


(one last Henson regret for good measure)


So Barnes, Ezeli, & Draymond it is. How do I feel about it at the time?


Uh, that's kind of spooky how accurate that fake quote turned out to be! (I'm apparently pretty good at fake interviewing people.)

I noted the hand measurements being small at the time of the draft. Anthony Davis doesn't seem to have been bothered by it (perhaps, because he was a point guard growing up), but I often think (and still do) it's a real issue for Barnes and at the core of his ball handling troubles on the perimeter:


Still, I was optimistic.


Oh, gosh. Really optimistic!


Starting to come down to reality.


Apparently I thought the bar needed to be lowered.


Foreshadowing here?


Hmm...jury still out on this one, perhaps?


This is still an insult apparently (but also still appropriate).


I think I shifted the proximity of my position on this one quite a bit in the interim.


This debate was a thing at the time.


It's really funny going back to that article to see what I had written as the "Case for Barnes":

The case I would make for Barnes actually has less to do with Barnes strengths than it does thinking about what will work best for the team. As stated above, one of my concerns with Barnes coming off the bench is that he'll feel that he has a responsibility to be "the scorer". That is the last thing I want in terms of his development as a player. Conversely, I feel that Barnes would have to learn how to play the "right way" as a member of the starting unit, because he would be surrounded by several players that are clearly a step or two or three above him right now in terms of offensive production. Of course, one could turn this right around and argue, well, if Barnes isn't in the starting unit because of his offense, and it isn't because of his defense, then maybe he shouldn't be starting, eh? And I can't really disagree with that argument. (I'm a terrible self-debater.)

Clearly, I am now of the same opinion as the second guy in that quote.

Back to the tweets! Here I start to notice Draymond.


That trend would continue and intensify.



Then I started to question the kool-aid.



I was at this game tweeting from Oracle! Perhaps, it could be like this forever.



He was decent for a while!


(with certain caveats)


Here is clear evidence of me hating Harrison Barnes:


Much more foreshadowing!


I was skeptical even against Denver.


At the time, some people were advocating for David Lee to be moved so that Barnes could replace him. Hmm. I wonder if those people ever said they were wrong about that.


I still wonder this, fwiw:


There's that Marvin Williams comp for the first time (from me):


At the time, a lot of folks said they wouldn't have (I wanted Kawhi on draft night, btw):


A continuing concern to this day. The number one concern in my estimation.


This. Still. Except not so much dunking.


And then we got Iguodala.


He is coming off the bench, and he is not shining. And they are discounting it because he doesn't have the benefit of always playing with better players. Sigh.


I believe this was something I heard Sam Mitchell say on NBA TV:


It's been pretty much all downhill from there:


Always this. But again this season with less dunks.


Still waiting.


You've surely heard me say this by now:


And probably this too:


Harrison Barnes' best skill:


This could get awkward:


And so it goes on and on:



Crazy talk!


Ok, I'm going to stop here. It just gets worse and worse.


Well, one last tweet for good measure.


Right idea, but the execution needs some work!

In his 2+ seasons as head coach of the Golden State Warriors, head coach Mark Jackson has clearly made improving the defense one of his highest priorities. So much so, in fact, that in a live blog/hangout yesterday morning from the Warriors training facility, Stephen Curry pointed out how all the photos of the team hanging on the wall depict the team defending the ball, as opposed to "posterizing" players on offense (so evidently "Barnes over Pekovic" is nowhere to be seen).

Curry goes on to show viewers a chart that Mark Jackson had created for the players to show them where they should try to force defenses to take shots, based on efficiencies. This is a great idea, and it's one of the things you have almost come to expect as analytics has swept into front office and coaching mentalities across the league, with the Warriors, perhaps, being one of its top proponents.

There is a curious thing, however, in this chart. And it makes me wonder how much further analytics needs to go before its lessons are fully learned (or even appreciated).

Screen Shot 2013-10-26 at 1.41.28 PM

Did you spot the problem? (If not, I suggest you read my Advanced Stats Primer!) Notice how the chart shows FG% in each region? From what we can see, there is no label as such, but to all of us who have studied the numbers even a little, it's clear that the %'s given are field goal percentages. It's sort of odd, right? I mean, if I was a player, the message I'd receive looking at this chart is that I'd rather force opponents to take "above the break" 3-pt shots (34.2%) as opposed to 16-23 ft jump shots (38.1%). But we know that a better metric to use here is "equivalent" or "effective" FG% (eFG%), which multiplies 3-pt shots by 1.5X, so that 34.2% becomes effectively 51% or so, much better than the long 2-pt jumpers.

And if you're thinking the numbers aren't important, that the players will only look at the colors (which to my eye are confusing, if anything), then why bother putting numbers at all? I see this as a window into the current state of affairs in the NBA. Analytics has definitely become the prominent way of thinking among the "NBA intelligentsia", and players are most likely aware of the "take-home messages", but there's still quite a ways to go until analytics becomes part of the everyday language of basketball (especially for players) in the same way that "pick and roll" or "coming off a screen" have implicit meaning.

Lists! The League's Best Scorers in 2013 according to Scoring Index

Long time, no write. I've been busy with things lately, as some of you may know. Hopefully, I can sprinkle in more posts now and again, though. So to ease back into this web logging habit, I've compiled a list of the best scorers this season from (heard of it?). The "Scoring Index" (SI) is based on work I did a while back (see here and here and here and here and here) looking at the tradeoff between usage (i.e. volume shooting) and efficiency (measured by TS%). At the very edge of the TS-USG relationship, there appears to be a "frontier" of all-time great scorers.

The "Usage-Efficiency" Frontier

The list I've compiled has a minimum threshold of 250 FGA taken. The one (significant) change I've made from the earlier metric is that SI is "signed", meaning if a player actually falls outside of the frontier (above and to the right of that line on the plot), they will have a SI > 1. IOW, they are scoring at a rate even better than the all-time greats. And wouldn't you know, we happen to have a couple players like that this season. You may have heard of them.

Here's the list in all it's glory. And if you're wondering (which you surely are by now)'s Draymond Green.

Introducing NBA WOWY!

I'll make this short and sweet. As some of you know, I've spent the last few months moving my codebase over to a new database framework. After finishing that I decided that there was so much good stuff in there, that I needed to make some of it public. NBA WOWY! ( — pronounced Wow-ee! — is the result.

The basic idea is that it lets you select any combination of players on or off the court and calculate the stats for all the other players. Right now, it's only got a few basic shooting stats, but much more is to come.

Update (Apr. 6): Ok, a few months later now, and here's a couple more recent screenshots:

Screen Shot 2013-04-06 at 5.13.13 PM

Screen Shot 2013-04-06 at 3.29.25 PM

Update (Jan. 6): The site now has a much fuller suite of stats, including turnovers, assists, and rebounding. More to come...

Updated screen shot.

Updated screen shot.

Think of this as the "beta" version, and you can be my very first beta testers.

Let me know what you think. My e-mail address is given at the bottom of the site. I'm interested to know what features you'd like to see added, in terms of both data and usability. Also, if you find bugs, please let me know.

Here's a quick tutorial. Let's say I want to know what the Warriors shoot when David Lee is on the court. I simply select Warriors from the team menu:

Screen Shot 2013-01-04 at 11.29.10 AM

Then I select David Lee from the "ON" menu in the green box and hit the "+" button to add him to the list (which is empty at first):

Screen Shot 2013-01-04 at 11.31.17 AM

Screen Shot 2013-01-04 at 11.32.34 AMAfter adding Lee, I click on the "Submit" button to run the query:

Screen Shot 2013-01-04 at 11.34.24 AMThen I just wait for the results (which should hopefully not take more than a few seconds to calculate):

Screen Shot 2013-01-04 at 11.35.27 AM

To re-run a new (different) query with a different filter, simply clear the list of players or add new ones or both. You can search literally any combination of players on or off the court. That's the whole point!

Anyway, that's pretty much all there is to it. Have fun and keep watching the site as I will periodically rollout updates.

A Post-Christmas Post about the Knicks Offense

Let's take shooting efficiency from the field (points per shot) and see how it is affected by having Carmelo Anthony, Tyson Chandler, and Jason Kidd on or off the floor.

First, with all 3 on the floor, here are the PPS stats for every Knicks player with >= 30 FGA (each list is NAME/FGA/PPS):

With melo, Tyson, & Kidd

  1. Tyson Chandler, 75, 1.36
  2. Jason Kidd, 74, 1.284
  3. Carmelo Anthony, 233, 1.189
  4. Ronnie Brewer, 63, 1.032
  5. J.R. Smith, 74, 0.973
  6. Raymond Felton, 175, 0.926

Now, we'll take each of them off, one at a time. The number in () is the ∆PPS from the above list with all 3 on the court.

Without Melo, With Tyson & Kidd

  1. Tyson Chandler, 33, 1.333 (-0.027)
  2. J.R. Smith, 39, 0.872 (-0.101)
  3. Jason Kidd, 39, 0.872 (-0.412)
  4. Raymond Felton, 79, 0.797 (-0.129)

Without Tyson, With Melo & Kidd

  1. Carmelo Anthony, 41, 1.0 (-0.189)

Without Kidd, With Melo & Tyson

  1. Tyson Chandler, 65, 1.446 (+0.086)
  2. Carmelo Anthony, 157, 0.968 (-0.221)
  3. Raymond Felton, 122, 0.844 (-0.082)
  4. Ronnie Brewer, 38, 0.816 (-0.216)
  5. J.R. Smith, 65, 0.769 (-0.204)


The stats pretty much speak for themselves, don't they? What they suggest is that the offense takes a significant hit when any of the three come off the floor. Also, Tyson Chandler appears to be the only Knicks player whose efficiency doesn't fluctuate too much regardless of who is on the court with him.

ezPM ratings are back!

(If you want to get on with Christmas and stuff, you can read this later, and just check out the new ezPM link at the top of the page.)

It's taken me several months to re-code my play-by-play parser since is no longer being updated (i.e. since Aaron Barzilai was hired by the 76ers). The cool part is that now I can make updates faster. I also have more data available to put in the model. Every play (or event) in my database has a lot of information associated with it that can be queried. To illustrate, here's a typical field goal attempt (it should be pretty straightforward to follow each field):

	"Lakers" : [
		"Steve Blake",
		"Kobe Bryant",
		"Antawn Jamison",
		"Pau Gasol",
		"Jordan Hill"
	"Warriors" : [
		"Stephen Curry",
		"Klay Thompson",
		"Richard Jefferson",
		"Carl Landry",
		"David Lee"
	"_id" : ObjectId("50d802605bca6d03c1008ad6"),
	"as" : 24,
	"away" : "Warriors",
	"block" : "Jordan Hill",
	"coords" : {
		"x" : 2,
		"y" : 10
	"date" : "2012-11-09",
	"distance" : 4,
	"espn_id" : "400277800",
	"event" : "Jordan Hill blocks a Stephen Curry driving finger roll shot from 4 feet out.",
	"home" : "Lakers",
	"hs" : 27,
	"made" : false,
	"opponent" : "Lakers",
	"pid" : 142,
	"q" : 2,
	"release" : "driving finger roll shot",
	"season" : "2013",
	"shooter" : "Stephen Curry",
	"t" : "9:22",
	"team" : "Warriors",
	"type" : "fga",
	"url" : "",
	"value" : 2

Here's an example of a turnover (you'll see the fields are somewhat different, because it's a different type of event):

	"Suns" : [
		"Goran Dragic",
		"Jared Dudley",
		"P.J. Tucker",
		"Luis Scola",
		"Marcin Gortat"
	"Warriors" : [
		"Stephen Curry",
		"Jarrett Jack",
		"Klay Thompson",
		"David Lee",
		"Andrew Bogut"
	"_id" : ObjectId("50d801f85bca6d03c1001113"),
	"as" : 46,
	"away" : "Warriors",
	"date" : "2012-10-31",
	"espn_id" : "400277730",
	"event" : "Stephen Curry with a bad pass turnover: Bad Pass",
	"home" : "Suns",
	"hs" : 36,
	"opponent" : "Suns",
	"pid" : 195,
	"player" : "Stephen Curry",
	"q" : 2,
	"season" : "2013",
	"t" : "3:39",
	"team" : "Warriors",
	"tov_type" : "Bad Pass",
	"type" : "tov",
	"url" : ""

Anyway, after doing all this, I can now get back to routinely calculating my various metrics, and hopefully, making them even more informative in the future. For example, here are a couple of things I'm working on for a future iteration of ezPM:

  • Change value of a rebound depending on the floor location and type of release. For example, if the offense tends to have a higher OREB% after a missed layup attempt, than the value of a defensive board in that situation should be higher.
  • Similarly, a player might be debited less for a missed layup attempt, since the offense has a better chance of securing the rebound.
  • Another change that I've been wanting to make for a while is to make the value of a possession dependent on the starting state. For example, possessions started after a steal, defensive rebound, or made basket, tend to have different expected values. This should be accounted for wherever the model uses PPP.

Simple Data Visualization using Node+Express+Jade

Update (2012-11-12): I created an app to go along with this post. Check it out at:

If you know about Node, you're probably one of the cool kids. And you'll no doubt grok this post. In a nutshell, Node.js enables one to create an entire web application stack from the server to the client using JavaScript. It's pretty cool and stuff.

Another cool JavaScript thingy these days is D3, which is a library for doing all kinds of awesome visualization (that's actually what the "d3" in my domain refers to, if you were ever wondering). What D3 does is it essentially lets you bind data to elements of the DOM (e.g. the underlying structure of a web page). So D3 is really great and it has a huge and ever-growing community of users.

The reason I'm writing this post is because I have found it's not that easy to inject D3 code into a web app built on the Node stack (which almost always includes the Express framework as well). I could only find one decent tutorial, and on top of Node and Express, that code wraps D3 in an AngularJS directive. While I was trying to figure out that code, I realized that for relatively simple use cases, it's possible to bind visual elements directly using nothing more than Node+Express+Jade. Jade is a popular HTML templating language.

To demonstrate how this works, we'll visualize shot location for the Warriors this season. First, we pull the data from some data store (in this case, I'm using MongoDB):

exports.shots = function(req, res){
    var team =;
    Db.connect(mongoUri, function(err, db) {
        console.log('show all shots!');
        db.collection('shots', function(err, coll) {
            coll.find({'for':team}).sort({'date':-1,'dist':-1}).toArray(function(err, docs) {
                res.render('shots',{shots: docs, team: team});

The important line there is: res.render('shots',{shots: docs, team: team});. This basically hands off the shot data (which is now an array) to the Jade template (called "shots.jade"). The template looks like this:

extends layout

block content

        h1 #{shots[0].for}
                each shot in shots
                    if (shot.made)

What you see is that the iterator each shot in shots in the Jade template created a element for each shot in the array pulled in from the database. Here's a screen shot of the final result (it's only running locally right now, so I can't give a link to the application):

Screen Shot of Jade-generated data visualization.

So there you have it. It's possible to do some basic data visualization using just Node+Express+Jade. There isn't a lot out there on this particular topic, so I figured this might help someone or give some inspiration to go further with it.

It's Early Yet, But There's Some Historically Productive Scoring in the League Right Now

You might remember I have done a bit of work on the usage-efficiency tradeoff in the past. The "payoff" was a chart that presented evidence of a usage-efficiency "frontier" (having stolen the idea of an efficient frontier from finance, of course):

All-time productive scoring seasons lie along the "frontier".

We're almost at the quarter-point of the 2012-13 season now, so I thought it would be interesting to look at the current leaders, and see where they stand with respect to the frontier. So far, pretty, pretty good. In particular, Kevin Durant, Kobe Bryant, Tyson Chandler (so good he looks to be close to setting a new point along the frontier), and Carmelo Anthony are on or very close to the frontier, itself. Have a look:

The players in green make up the historical reference for the "frontier". Note that Chandler would be very near the frontier if it was extrapolated out further.

Of course, we should expect some regression to the mean. How much is anyone's guess, so I'll update the results periodically throughout the season.

On Ceilings and Floors and Betting

Harrison Barnes has exceeded most Warriors' fans expectations though 9 games this season. He's looked especially good in the last copule of games. This has prompted some fans to re-visit the classical sports discussion regarding a player's  "ceiling" and "floor". While the topic is one of the oldest in the book, the criteria for selecting a ceiling and floor for a player is not very clear (to me, anyway).

I think that most people see it as equivalent to asking following question:

Who is the best current or former player that Player X has *some* possibility of becoming better than?

The key word here is *some*. When a fan suggests a ceiling that is deemed too low, the response is always something like, "How can you say he doesn't have *some* chance to be better than that player!?" Well, my reply is, of course, there's *some* chance. I'm going to illustrate why this is a problematic foundation for the discussion.

I think it's fair to say that, ideally, we would like to have debates that have some objectivity to them. One way to constrain a debate to be more objective is simply to introduce a bet. A bet invariably has to be settled by some objective criteria, otherwise, neither party would agree. If we want to debate which team is better, we should bet on the outcome of a game or maybe a season. That might not truly settle the debate, but at least it's an objective approach. If I pick Team A and you pick Team B, we bet against each other, and the winner is easy to declare.

So let's think about how we might construct a bet on the ceiling for a player (the floor could be done in a similar way). Here's one way to do it. The player in question is Player X. I propose that Player A is his ceiling. You propose that Player B is his ceiling. First, we need some objective criterion, i.e. a "stat". For the sake of argument, I'll just choose a stat that most everyone reading this has heard of: Hollinger's PER. (This is not the time to debate the merit of PER. You can substitute any stat you would like, as it won't materially change the point at hand.) Ok, so with per as the base metric, the winner of the bet is the one who picks the ceiling that is closest to Player X.

Let me demonstrate with some numbers. Say that Player A's highest PER was 25 and Player B's highest PER was 30. Let's have one scenario where Player X ends up with a PER of 24. In this case, I win the bet because Player A meets two important criteria: 1) Player X did not achieve a PER higher than Player A (which would mean Player A was by definition too low a ceiling); and 2) In absolute terms, the difference between Player X and Player A is smaller than between Player X and Player B.

Now, say we have another scenario where Player X ends up with a PER of 26. In that case - and again, according to how I would set up the bet - you would win simply because Player X achieved a PER higher than the ceiling I set for him. The fact that my ceiling was closer (in absolute terms) doesn't make a difference.

Does that make sense? Let me re-iterate that this is just one way to construct the bet. Obviously, there are others. We could just take the absolute difference and not worry about whether Player X ends up higher or lower than our ceilings. I don't like that approach, because I'm used to thinking about the games from the Price is Right, where you had to guess the price *without going over*. It makes even more sense to have that rule here, because the whole point of choosing a ceiling is that we're saying that is the player's LIMIT.

The problem I have with the original (and seemingly more popular) approach to the ceiling/floor discussion is that there's really now way to evaluate it objectively. Let's use Harrison Barnes as the example. I'll say that his ceiling is Danny Granger. You say that his ceiling is LeBron James. Who would win that bet ? If Barnes never becomes "better" than LeBron, do you win? If that's the case, what exactly is the incentive of choosing any player other than arguably the best SF of all time? The ceiling for every SF would then either be Bird or James, right?

Now, I think most people inherently understand that dilemma, so they pick someone not quite as good as that for Barnes. But the criteria for doing so is usually ad hoc. It's basically, "Well, I think he has some chance of being better than this player, but no chance of being better than this player."

My point is let's bet on it. Let's put some numbers on it. The challenge here is not to pick *some* player that is the absolute ceiling (which is easy and trivial). The challenge (for me, anyway) is to pick the *worst* player that you think will be *better* than the player in question (Player X). Because otherwise, as I said, there's no incentive to pick anyone other than the best player of all time. In math, they would say that's an "ill-posed" problem. In order to make it a well-posed problem, it seems to me the logical solution is to construct it as a bet. From there everything else follows.

I know, that was a lot of words. But next time you enter into the ceiling/floor debate or listen to it on tv, just remember the main point here: Pick the guy that you would be willing to bet on.

A Grown Man NBA Blog