This is going to be a long post, but you should view it primarily as a reference that you can come back to — and I imagine many of you will, because these data are not presented so succintly (i.e. in an organized fashion) anywhere else, as far as I can tell.
If you follow mine or other NBA stat blogs, you have almost certainly come across Synergy stats. Briefly, Synergy folks sit down and watch a ton of video in order to classify every play for every team during the course of a season. Over the past couple of weeks I have started looking at the team-level data to look for patterns which would indicate specific team strategies. It's much more complex than I had imagined, and in time, I think — er, hope — I will have much more to say about that. For this first post, I just want to provide a summary of what I have found so far, and perhaps, a few thoughts.
First, I need to explain what the data are. Synergy classifies plays into 11 different types (my abbreviations are in parentheses):
- Isolation (ISO)
- P&R Ball Handler (BALL)
- Post-Up (POST)
- P&R Man (ROLL)
- Spot-Up (SPOT)
- Off Screen (SCREEN)
- Hand off (HAND)
- Cut (CUT)
- Offensive Rebound (REB)
- Transition (TRANS)
- All other plays (OTHER)
For each of these play types, Synergy reports a rate equal to the percentage of the time each play is used (sum of rates for each team equals 1, by definition) and the points per play (PPP). The overall PPP for a team is then as follows:
Obviously, the objective at the team level is to maximize PPP on offense (minimize on defense), and the above formula makes it clear that this can be done by varying the rate and efficiency at which each play is used (or allowed) — subject to the constraint shown above that the rates must sum to 1. That might sound obvious (it is), but this auxiliary condition is critical to deal with in (eventually) formulating this as an optimization problem. (It's also what makes it challenging.) In plain language, what the constraint tells us is that if you want to do more of one type of play, you must do less of another type of play. On defense, for example, if you deny POST plays, you will necessarily allow more of some other type of play, say, SPOT or BALL.
Let's start looking at the data. First, I want to show "box and whisker" plots that summarize the RATE at which each play is used on offense. This type of plot shows the median (black line in the middle of the box), the range of data (the box surrounding the median represents the 25-75 %-ile), and any outliers (circles outside the "whiskers", which represent the middle 95 %-ile).
What you see here is that SPOT plays are used much more often than all other play types. BALL, ISO, POST, and TRANS are all used fairly similarly (they're actually statistically different distributions according to a pairwise t-test, but let's not worry about that now). Other types of plays are used much less often. You can see that there is a huge amount of variation for certain play types (POST, BALL, and ISO), represented by the greater height of the boxes, while other play types have less variation or smaller boxes (TRANS, HAND, CUT, REB, OTHER, ROLL). The way I would interpret this observation is that the plays with more variation have more to do with team strategy and talent level, whereas the plays with lesser variation are less "controllable". Next, we look at efficiency at which each play is executed.
Notice how different looking this plot is. For example, the most efficient play is the CUT, but that was one of the least used play types in the previous plot. I'm not an X's and O's guy, but obviously this means that CUT plays are typically well-defended. Otherwise, you'd run them all the time. Similar logic probably applies to TRANS, although that is somewhat more complicated because it is highly linked to defense and playing style. For example, you can see that the Warriors were one of the most efficient teams in transition, and actually had a high rate (see previous plot), but this risky strategy undoubtedly came at the expense of giving up easy baskets with similar frequency. Surprisingly to me, POST plays are about average or even below-average, in terms of efficiency. But I remember just a few months ago, there was an article in the New York Times by Rob Mahoney that suggested "points in the paint" were overstated. It looks like the Synergy data are consistent with his hypothesis. Interesting, that. Ok, onward.
Putting it all together, we have the box and whisker plot for the total offensive PPP:
The take-home message from these plots is that there are a myriad of variables that contribute to overall PPP. So, the question you are probably asking now is how do the ratings look for individual teams compared to each other.
With 11 different play types, offense and defense, PPPi, RATE, and total PPP, the data are simply overwhelming to present in a concise fashion. So, instead of presenting dozens of different tables with the data and the rankings, I just put the data in a couple of Google Spreadsheet that anyone with the following links can download. The spreadsheets are sorted by TEAM (ascending) and then by RATE (descending), thus quickly giving you an idea of which plays a team used most. The three columns on the right are standardized conversions of the RATE, PPPi, and PPP for each play type (i.e. a value greater than 1 means your team is more than 1 standard deviation away from the league mean in that category).
[googleapps domain="spreadsheets" dir="pub" query="hl=en_US&hl=en_US&key=0Al6a2ecvJfTidHpQRmR6QUxJQnRWalRXZ2VuVVQ2UFE&output=html&widget=true" width="500" height="300" /]
[googleapps domain="spreadsheets" dir="pub" query="hl=en_US&hl=en_US&key=0Al6a2ecvJfTidFBvMGRjU01sYTNGbUhUZEswQ3BRLVE&output=html&widget=true" width="500" height="300" /]
And here's just one example of how you could use these data. Here are the standardized rates (RATE) on offense and defense for the 4 conference finalists (CHI, MIA, DAL, OKC).