Category Archives: Uncategorized

Don't hate the list. Hate the model.

(In which Evan actually writes more than 140 characters about something all in one place!)

Got up early this morning (6:00 am) and walked over to the bus stop as my normal routine dictates, and I'm greeted with this tweet:

If you haven't read about my ezPM metric, you can do so here (when I introduced the model) and here (when I updated it to use play-by-play data) and here (when I added counterpart rebounding to the model) and here (when I added counterpart defense to the model). I developed it several years ago after a long bout of Wins Produced fatigue.

Now, Kristofers who sent me that tweet was clearly looking at the results of my latest ezPM rankings. I could be wrong, but I highly doubt he has read each of the posts I just linked to in the previous paragraphs. Why do I doubt this? Because if he had, surely, he wouldn't be asking me why Harden's defensive rating is much higher than Steph or higher than Klay. It's in the model!

I mean, unless he thinks I'm doing the math wrong, somewhere in the model, the points Harden has accumulated is dictated deterministically (it's not a stochastic algorithm).

This brings me to the larger point of the post *slash* rant. Human psychology being what it is, when we are presented with lists of any form, the first thing we tend to do is find *outliers*. I don't know why. Ask Kahneman. But that's what we do, right? We don't go down the list looking for things we agree with. We go fishing for things we disagree with so we can call out the list maker. And these days, preferably on Twitter.

I would argue that there are some lists where this is appropriate behavior. When the list is a purely subjective one, with no underlying model, at least, no rigorous one, then we generally have no choice but to argue or question the brain of the list maker. In there lies the only model we can point to.

But when a list is purely objective, such as my ezPM rankings, don't question me personally about it on twitter. Question my model. Please! I actually encourage you to question my model. Or any model at all, if you think the output is questionable. That's how models get improved.

When I thought there were issues with how WP ranked players, my first reaction, as most, was to criticize the list itself. But in time, I realized the only meaningful way to handle my criticism of the list, was not to attack it directly. It was to question and eventually attack the Wins Produced model, itself. And then to improve on it by creating my own.

So remember. Don't hate the list. Hate the model. Everyone, including you, will be better off for it.

On the Side: Using Apache Spark and Clojure for Basketball Reasons(?)

Flambo!

I spend a lot of time thinking about "the next big thing". In tech it seems you can almost never be too far ahead of the curve. Whatever toolset you're working with now, chances are some kids out there are spending their nights trying to obsolete disrupt it (Note: "disrupt" obsoleted the word "obsolete").

When I started building nbawowy.com towards the end of 2012, I decided to use a stack that wasn't all that common, but fast forward to today and the "MEAN" stack (MEAN = Mongodb, Express, AngularJS, and Node.js) as it came to be called seems to be everywhere (funny enough, I was actually referring to it as AMEN). AngularJS (a Google-backed project) now has over 27,000 stars, almost 10,000 more than Backbone, which was widely considered the "default" Javascript front-end framework before 2013.

This past year I spent a lot of time in my day job learning more about "big data" and how to deal with it. In the past 5 or so years this mostly meant learning how to run MapReduce jobs on Hadoop, either by hand-coding them yourself in Java or using a higher-level scripting language, such as Pig or Hive. Not being a Java developer myself, I decided to learn Pig (a top-level Apache project) and it has made me much more productive.

Let me tell you how. At my work (a "social network" app called Skout) we generate a lot of data every day, not nearly as much as a Twitter or Facebook, of course, but enough to make it inconvenient to work with using traditional means (MySQL!). Last time I checked we were generating somewhere in the neighborhood of 100 million data messages per day (a "data message" is a little piece of JSON-formatted text sent over the network that tells us about an action taken by the user in the app). Like many companies, we store these messages on S3, an Amazon AWS service which is essentially an infinitely (for our purposes) scalable storage service in the cloud.

You can think of S3 as a really gigantic hard drive. What MapReduce (or Pig in my case) allows one to do is query the data in an ad hoc fashion, but the catch is that up until now this has mostly been a batch process. So one of my queries (...count all the chat messages sent by women under the age of 25 in Asian countries on Android phones over the past week) might take anywhere from 10 minutes upwards of an hour. It's better than nothing, and often the only way to get real answers, but it sort of takes the hoc out of ad hoc. What I'd really like to be able to do (and so would everyone else in tech) is be able to interactively query the data on S3 (or some other Hadoop service). And by "interactive", I mean essentially get real-time or near real-time (seconds to a couple minutes) results as one would get by querying a MySQL database (at least, one designed for such a purpose). With such a system it becomes possible to iterate much faster. It also literally enables data scientists to implement iterative algorithms that were previously not feasible using the current MapReduce toolset.

Enter Apache Spark, a cluster computing project coming out of UC Berkeley that has burst onto the big data scene in the past year. The selling point of Spark being the following:

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

The promise of Spark is to enable a whole new set of big data applications. Naturally, I became intrigued when I first learned about it, and thought it could be a great new tool for my day job. My second thought was...can I use this for basketball statistics? The obvious answer being: Sure, why the hell not? One thing that is useful about being a (self-proclaimed) NBA stats geek is that I always have a fun data sandbox at my command (I'm not sure there are two things, actually).

Spark comes out of the box with an API in three different programming languages: Java, Scala (the source code language), and Python. Unfortunately, I'm not using any of those languages, and the language I typically use for such things (Ruby) isn't supported (yet, although I'm sure there will eventually be such a project). There is a SparkR project, but I had another idea. In the past few months I have taken up the task of learning Clojure, which is basically a Lisp that runs on the JVM.  Scala, by the way, is in a similar vein in that it is a functional language hosted on the JVM. In researching the two languages, I simply decided that Clojure was eventually what "all the cool kids" would be doing, and that's always where I want to be. Also, Rich Hickey, the developer of Clojure, is brilliant and reminds me of the 70's version of Doctor Who.

Fortunately, there is a project called Flambo that is developing a Clojure API for Spark. I decided to give it try. I'm in the very early phase of the learning curve, but I've already figured out enough to see that this is shaping up to be a very cool/powerful data stack, indeed.

First, here is a sample of the data set I'm using, which comes straight from my nbawowy database:

{
	"76ers" : [
		"Lorenzo Brown",
		"Elliot Williams",
		"Hollis Thompson",
		"Brandon Davies",
		"Daniel Orton"
	],
	"Timberwolves" : [
		"A.J. Price",
		"Alexey Shved",
		"Robbie Hummel",
		"Ronny Turiaf",
		"Gorgui Dieng"
	],
	"_id" : ObjectId("53531a345bca6d54dd0382b2"),
	"as" : 120,
	"assist" : null,
	"away" : "Timberwolves",
	"coords" : {
		"x" : 13,
		"y" : 15
	},
	"date" : "2014-01-06",
	"distance" : 16,
	"espn_id" : "400489378",
	"event" : "A.J. Price makes a pull up jump shot from 16 feet out.",
	"home" : "76ers",
	"hs" : 93,
	"last_state" : {
		"type" : "fga",
		"val" : 2,
		"rel" : "jump shot",
		"made" : true,
		"shooter" : "Daniel Orton",
		"dist" : 17
	},
	"made" : true,
	"opponent" : "76ers",
	"pd" : 27,
	"pid" : 424,
	"points" : 2,
	"q" : 4,
	"release" : "pull up jump shot",
	"season" : "2014",
	"shooter" : "A.J. Price",
	"t" : "2:22",
	"team" : "Timberwolves",
	"type" : "fga",
	"url" : "http://scores.nbcsports.msnbc.com/nba/pbp.asp?gamecode=2014010620",
	"value" : 2
}

This is a single play. Each season of nbawowy has roughly 550K plays just like this with metadata describing all kinds of things I pull out from the play-by-play data with my current parser (written in Ruby). The 2013-2014 season is a little under 500 MB of data like this. I "dumped" it to a text file that could then be processed with Flambo/Spark.

The following is a code sample that produces the number of made three-point field goals by the Warriors last season in descending order (comments are denoted by leading semi-colons):

;; create a namespace and require libraries
(ns flambo.clojure.spark.demo
  (:require [flambo.conf :as conf])
  (:require [flambo.api :as f])
  (:require [c1ojure.data.json :as json]))

;; configure Spark
(def c (-> (conf/spark-conf)
           (conf/master "local[*]")
           (conf/app-name "nba_dsl")))

;; create a SparkContext object
(def sc (f/spark-context c))

;; read in plays from nbawowy database
(def plays (f/text-file sc "/Users/evanzamir/Code/Clojure/flambo-nba/resources/plays.json")) ;; returns an unrealized lazy dataset

;; define a function that prints out field goals
(defn field-goals-made-by-player
  [team p]
  (let
      [fgm
       (-> p
           (f/map (f/fn [x] (json/read-str x :key-fn keyword)))
           (f/filter (f/fn [x] (and (= "fga" (:type x))
                                    (= 3 (:value x))
                                    (= true (:made x))
                                    (= team (:team x)))))
           (f/map (f/fn [x] [(.toUpperCase (:shooter x)) 1]))
           (f/reduce-by-key (f/fn [x y] (+ x y)))
           f/collect)]
    (clojure.pprint/pprint (sort-by last > fgm))))

(field-goals-made-by-player "Warriors" plays)

The results of this code (generated by the very last line) are a list of Warriors 3pt fgm last season:

(["STEPHEN CURRY" 261]
["KLAY THOMPSON" 223]
["HARRISON BARNES" 66]
["ANDRE IGUODALA" 62]
["DRAYMOND GREEN" 55]
["JORDAN CRAWFORD" 40]
["STEVE BLAKE" 27]
["TONEY DOUGLAS" 19]
["KENT BAZEMORE" 10]
["MARREESE SPEIGHTS" 8]
["NEMANJA NEDOVIC" 3])

I'm not going to explain the code, except to say it is basically a series of very common functional operations, including filter, map, and reduce. Every line in the code where you see "f/operation" is the Flambo api instructing Spark to do some operation on a dataset (called an RDD in Spark terminology). There is another important point to be made about the code. You can see in Line 29 the .toUpperCase function being called. This is interesting because it is actually a Java function being called from Clojure and passed to the Spark engine. One of the design principles of Clojure is to enable very transparent and powerful interoperability with Java, which enables one to take advantage of the tremendous amount of Java libraries available. It is a huge win (and also true for Scala, btw).

I hope this post was useful. It really just scratches the surface of what is possible. This was all done locally on a MacBook Pro (automatically multi-threaded though!). The real fun begins when you take the code to a cluster (think EC2 and S3). It wouldn't suprise me at all if some NBA analytics departments working with SportsVU data are already headed down this path even as you read this. I would encourage anyone interested in a future in analytics (NBA or otherwise) to check out these projects.

NBA Combine Measurement Similarities

I'm sick, so I data.

With the annual NBA Draft Combine having completed the anthropometric and athletic testing portion, it's a good time to update the similarity study I did a few years ago here. To summarize, I take all the testing categories available from DraftExpress (from 2009 through 2014) and use a couple of R packages (ape and cluster) to spit out the similarities between players. The result is a circular dendrogram. The closer two players are on the dendrogram, the more similar they are in terms of the combine results.NBA Draft Combine similarities 2009-2014.

A few examples of closest comps for fun:

  • Garry Harris and Austin Rivers
  • Thanasis Ante... and Wesley Johnson
  • James Young and Xavier Henry
  • Aaron Craft and Jimmer Fredette
  • Jahii Carson and Peyton Siva
  • Jordan McRae and Jeremy Lab
  • Noah Vonleh and Derrick Favors

See if you can find some others. It's not perfect, of course. But it's fun. Should entertain you for at least several minutes. Enjoy! Pass it around on the interwebz if you like.

 

NBA Draft Combine similarities 2009-2014.

A History of Hating Harrison Barnes

I think Twitter is amazing. It is also somewhat, perhaps mostly, responsible for the diminution in frequency of my long-form blog posts here and at GSoM over the last couple years (also I got really freaking busy with the nbawowy stuff). It's just so easy on twitter to communicate your thoughts in real-time, that I often feel like I've already said everything I want to say, and it obviates the need for more than 140 characters at a time that the old-fashioned blog platform originally provided.

If you are reading this, there is a good chance you follow me on twitter, and if you follow me on twitter you probably have heard me make a disparaging remark or two about the play of a certain Golden State Warrior who arrived by way of North Carolina and Iowa. I'm referring, of course, to Harrison Bryce Jordan Barnes. They say don't hate the player, hate the game. Well, I've tried my best to hate the game, but I am continually accused of hating the player regardless.

As a fun exercise for myself, and to stir the passions of Barnes fanboys everywhere, I wanted to go through my history of tweeting about Barnes (I now have over 706 tweets with "Barnes" as a search term, although some of those could be about Matt Barnes!) to see how my "hate" for this player came to be. Think of this post as the origin myth for the most rampant and prolific Barnes "hater" on all of twitter (if you know of anyone who "hates" Barnes more than myself, let me know in the comments or on twitter!). So without any further adieu...let's a do this.

I didn't think Barnes would be there at 7 leading up to the draft.

 

And cue the draft, Barnes falls to 7. I'm apparently fine with it.

 

Although in my heart and head I wanted us to take John Henson.

(since 2011)

 

(and Nicholson!)

 

(and I knew we would never have the balls to take him)

 

(oh, the wildcard!)

 

(one last Henson regret for good measure)

 

So Barnes, Ezeli, & Draymond it is. How do I feel about it at the time?

 

Uh, that's kind of spooky how accurate that fake quote turned out to be! (I'm apparently pretty good at fake interviewing people.)

I noted the hand measurements being small at the time of the draft. Anthony Davis doesn't seem to have been bothered by it (perhaps, because he was a point guard growing up), but I often think (and still do) it's a real issue for Barnes and at the core of his ball handling troubles on the perimeter:

 

Still, I was optimistic.

 

Oh, gosh. Really optimistic!

 

Starting to come down to reality.

 

Apparently I thought the bar needed to be lowered.

 

Foreshadowing here?

 

Hmm...jury still out on this one, perhaps?

 

This is still an insult apparently (but also still appropriate).

 

I think I shifted the proximity of my position on this one quite a bit in the interim.

 

This debate was a thing at the time.

 

It's really funny going back to that article to see what I had written as the "Case for Barnes":

The case I would make for Barnes actually has less to do with Barnes strengths than it does thinking about what will work best for the team. As stated above, one of my concerns with Barnes coming off the bench is that he'll feel that he has a responsibility to be "the scorer". That is the last thing I want in terms of his development as a player. Conversely, I feel that Barnes would have to learn how to play the "right way" as a member of the starting unit, because he would be surrounded by several players that are clearly a step or two or three above him right now in terms of offensive production. Of course, one could turn this right around and argue, well, if Barnes isn't in the starting unit because of his offense, and it isn't because of his defense, then maybe he shouldn't be starting, eh? And I can't really disagree with that argument. (I'm a terrible self-debater.)

Clearly, I am now of the same opinion as the second guy in that quote.

Back to the tweets! Here I start to notice Draymond.

 

That trend would continue and intensify.

 

 

Then I started to question the kool-aid.

 

 

I was at this game tweeting from Oracle! Perhaps, it could be like this forever.

 

 

He was decent for a while!

 

(with certain caveats)

 

Here is clear evidence of me hating Harrison Barnes:

 

Much more foreshadowing!

 

I was skeptical even against Denver.

 

At the time, some people were advocating for David Lee to be moved so that Barnes could replace him. Hmm. I wonder if those people ever said they were wrong about that.

 

I still wonder this, fwiw:

 

There's that Marvin Williams comp for the first time (from me):

 

At the time, a lot of folks said they wouldn't have (I wanted Kawhi on draft night, btw):

 

A continuing concern to this day. The number one concern in my estimation.

 

This. Still. Except not so much dunking.

 

And then we got Iguodala.

 

He is coming off the bench, and he is not shining. And they are discounting it because he doesn't have the benefit of always playing with better players. Sigh.

 

I believe this was something I heard Sam Mitchell say on NBA TV:

 

It's been pretty much all downhill from there:

 

Always this. But again this season with less dunks.

 

Still waiting.

 

You've surely heard me say this by now:

 

And probably this too:

 

Harrison Barnes' best skill:

 

This could get awkward:

 

And so it goes on and on:

 

 

Crazy talk!

 

Ok, I'm going to stop here. It just gets worse and worse.

 

Well, one last tweet for good measure.

 

Right idea, but the execution needs some work!

In his 2+ seasons as head coach of the Golden State Warriors, head coach Mark Jackson has clearly made improving the defense one of his highest priorities. So much so, in fact, that in a live blog/hangout yesterday morning from the Warriors training facility, Stephen Curry pointed out how all the photos of the team hanging on the wall depict the team defending the ball, as opposed to "posterizing" players on offense (so evidently "Barnes over Pekovic" is nowhere to be seen).

Curry goes on to show viewers a chart that Mark Jackson had created for the players to show them where they should try to force defenses to take shots, based on efficiencies. This is a great idea, and it's one of the things you have almost come to expect as analytics has swept into front office and coaching mentalities across the league, with the Warriors, perhaps, being one of its top proponents.

There is a curious thing, however, in this chart. And it makes me wonder how much further analytics needs to go before its lessons are fully learned (or even appreciated).

Screen Shot 2013-10-26 at 1.41.28 PM

Did you spot the problem? (If not, I suggest you read my Advanced Stats Primer!) Notice how the chart shows FG% in each region? From what we can see, there is no label as such, but to all of us who have studied the numbers even a little, it's clear that the %'s given are field goal percentages. It's sort of odd, right? I mean, if I was a player, the message I'd receive looking at this chart is that I'd rather force opponents to take "above the break" 3-pt shots (34.2%) as opposed to 16-23 ft jump shots (38.1%). But we know that a better metric to use here is "equivalent" or "effective" FG% (eFG%), which multiplies 3-pt shots by 1.5X, so that 34.2% becomes effectively 51% or so, much better than the long 2-pt jumpers.

And if you're thinking the numbers aren't important, that the players will only look at the colors (which to my eye are confusing, if anything), then why bother putting numbers at all? I see this as a window into the current state of affairs in the NBA. Analytics has definitely become the prominent way of thinking among the "NBA intelligentsia", and players are most likely aware of the "take-home messages", but there's still quite a ways to go until analytics becomes part of the everyday language of basketball (especially for players) in the same way that "pick and roll" or "coming off a screen" have implicit meaning.

Lists! The League's Best Scorers in 2013 according to Scoring Index

Long time, no write. I've been busy with things lately, as some of you may know. Hopefully, I can sprinkle in more posts now and again, though. So to ease back into this web logging habit, I've compiled a list of the best scorers this season from nbawowy.com (heard of it?). The "Scoring Index" (SI) is based on work I did a while back (see here and here and here and here and here) looking at the tradeoff between usage (i.e. volume shooting) and efficiency (measured by TS%). At the very edge of the TS-USG relationship, there appears to be a "frontier" of all-time great scorers.

The "Usage-Efficiency" Frontier

The list I've compiled has a minimum threshold of 250 FGA taken. The one (significant) change I've made from the earlier metric is that SI is "signed", meaning if a player actually falls outside of the frontier (above and to the right of that line on the plot), they will have a SI > 1. IOW, they are scoring at a rate even better than the all-time greats. And wouldn't you know, we happen to have a couple players like that this season. You may have heard of them.

Here's the list in all it's glory. And if you're wondering (which you surely are by now)...it's Draymond Green.

Simple Data Visualization using Node+Express+Jade

Update (2012-11-12): I created an app to go along with this post. Check it out at: http://ezamir.mongotest.jit.su.

If you know about Node, you're probably one of the cool kids. And you'll no doubt grok this post. In a nutshell, Node.js enables one to create an entire web application stack from the server to the client using JavaScript. It's pretty cool and stuff.

Another cool JavaScript thingy these days is D3, which is a library for doing all kinds of awesome visualization (that's actually what the "d3" in my domain refers to, if you were ever wondering). What D3 does is it essentially lets you bind data to elements of the DOM (e.g. the underlying structure of a web page). So D3 is really great and it has a huge and ever-growing community of users.

The reason I'm writing this post is because I have found it's not that easy to inject D3 code into a web app built on the Node stack (which almost always includes the Express framework as well). I could only find one decent tutorial, and on top of Node and Express, that code wraps D3 in an AngularJS directive. While I was trying to figure out that code, I realized that for relatively simple use cases, it's possible to bind visual elements directly using nothing more than Node+Express+Jade. Jade is a popular HTML templating language.

To demonstrate how this works, we'll visualize shot location for the Warriors this season. First, we pull the data from some data store (in this case, I'm using MongoDB):

exports.shots = function(req, res){
    console.log(req.route.params.team);
    var team = req.route.params.team;
    Db.connect(mongoUri, function(err, db) {
        console.log('show all shots!');
        db.collection('shots', function(err, coll) {
            coll.find({'for':team}).sort({'date':-1,'dist':-1}).toArray(function(err, docs) {
                db.close();
                res.render('shots',{shots: docs, team: team});
            });
        });
    });
};

The important line there is: res.render('shots',{shots: docs, team: team});. This basically hands off the shot data (which is now an array) to the Jade template (called "shots.jade"). The template looks like this:

extends layout

block content

    div.hero-unit
        h1 #{shots[0].for}
    div.row
        div.span2.offset1
            svg(width=600,height=600)
                each shot in shots
                    if (shot.made)
                        circle(cx="#{(shot.coords.x+25)*10}",cy="#{shot.coords.y*10}",r="3",fill="green",stroke="black")
                    else
                        circle(cx="#{(shot.coords.x+25)*10}",cy="#{shot.coords.y*10}",r="3",fill="red",stroke="black")

What you see is that the iterator each shot in shots in the Jade template created a element for each shot in the array pulled in from the database. Here's a screen shot of the final result (it's only running locally right now, so I can't give a link to the application):

Screen Shot of Jade-generated data visualization.

So there you have it. It's possible to do some basic data visualization using just Node+Express+Jade. There isn't a lot out there on this particular topic, so I figured this might help someone or give some inspiration to go further with it.

It's Early Yet, But There's Some Historically Productive Scoring in the League Right Now

You might remember I have done a bit of work on the usage-efficiency tradeoff in the past. The "payoff" was a chart that presented evidence of a usage-efficiency "frontier" (having stolen the idea of an efficient frontier from finance, of course):

All-time productive scoring seasons lie along the "frontier".

We're almost at the quarter-point of the 2012-13 season now, so I thought it would be interesting to look at the current leaders, and see where they stand with respect to the frontier. So far, pretty, pretty good. In particular, Kevin Durant, Kobe Bryant, Tyson Chandler (so good he looks to be close to setting a new point along the frontier), and Carmelo Anthony are on or very close to the frontier, itself. Have a look:

The players in green make up the historical reference for the "frontier". Note that Chandler would be very near the frontier if it was extrapolated out further.


Of course, we should expect some regression to the mean. How much is anyone's guess, so I'll update the results periodically throughout the season.

On Ceilings and Floors and Betting

Harrison Barnes has exceeded most Warriors' fans expectations though 9 games this season. He's looked especially good in the last copule of games. This has prompted some fans to re-visit the classical sports discussion regarding a player's  "ceiling" and "floor". While the topic is one of the oldest in the book, the criteria for selecting a ceiling and floor for a player is not very clear (to me, anyway).

I think that most people see it as equivalent to asking following question:

Who is the best current or former player that Player X has *some* possibility of becoming better than?

The key word here is *some*. When a fan suggests a ceiling that is deemed too low, the response is always something like, "How can you say he doesn't have *some* chance to be better than that player!?" Well, my reply is, of course, there's *some* chance. I'm going to illustrate why this is a problematic foundation for the discussion.

I think it's fair to say that, ideally, we would like to have debates that have some objectivity to them. One way to constrain a debate to be more objective is simply to introduce a bet. A bet invariably has to be settled by some objective criteria, otherwise, neither party would agree. If we want to debate which team is better, we should bet on the outcome of a game or maybe a season. That might not truly settle the debate, but at least it's an objective approach. If I pick Team A and you pick Team B, we bet against each other, and the winner is easy to declare.

So let's think about how we might construct a bet on the ceiling for a player (the floor could be done in a similar way). Here's one way to do it. The player in question is Player X. I propose that Player A is his ceiling. You propose that Player B is his ceiling. First, we need some objective criterion, i.e. a "stat". For the sake of argument, I'll just choose a stat that most everyone reading this has heard of: Hollinger's PER. (This is not the time to debate the merit of PER. You can substitute any stat you would like, as it won't materially change the point at hand.) Ok, so with per as the base metric, the winner of the bet is the one who picks the ceiling that is closest to Player X.

Let me demonstrate with some numbers. Say that Player A's highest PER was 25 and Player B's highest PER was 30. Let's have one scenario where Player X ends up with a PER of 24. In this case, I win the bet because Player A meets two important criteria: 1) Player X did not achieve a PER higher than Player A (which would mean Player A was by definition too low a ceiling); and 2) In absolute terms, the difference between Player X and Player A is smaller than between Player X and Player B.

Now, say we have another scenario where Player X ends up with a PER of 26. In that case - and again, according to how I would set up the bet - you would win simply because Player X achieved a PER higher than the ceiling I set for him. The fact that my ceiling was closer (in absolute terms) doesn't make a difference.

Does that make sense? Let me re-iterate that this is just one way to construct the bet. Obviously, there are others. We could just take the absolute difference and not worry about whether Player X ends up higher or lower than our ceilings. I don't like that approach, because I'm used to thinking about the games from the Price is Right, where you had to guess the price *without going over*. It makes even more sense to have that rule here, because the whole point of choosing a ceiling is that we're saying that is the player's LIMIT.

The problem I have with the original (and seemingly more popular) approach to the ceiling/floor discussion is that there's really now way to evaluate it objectively. Let's use Harrison Barnes as the example. I'll say that his ceiling is Danny Granger. You say that his ceiling is LeBron James. Who would win that bet ? If Barnes never becomes "better" than LeBron, do you win? If that's the case, what exactly is the incentive of choosing any player other than arguably the best SF of all time? The ceiling for every SF would then either be Bird or James, right?

Now, I think most people inherently understand that dilemma, so they pick someone not quite as good as that for Barnes. But the criteria for doing so is usually ad hoc. It's basically, "Well, I think he has some chance of being better than this player, but no chance of being better than this player."

My point is let's bet on it. Let's put some numbers on it. The challenge here is not to pick *some* player that is the absolute ceiling (which is easy and trivial). The challenge (for me, anyway) is to pick the *worst* player that you think will be *better* than the player in question (Player X). Because otherwise, as I said, there's no incentive to pick anyone other than the best player of all time. In math, they would say that's an "ill-posed" problem. In order to make it a well-posed problem, it seems to me the logical solution is to construct it as a bet. From there everything else follows.

I know, that was a lot of words. But next time you enter into the ceiling/floor debate or listen to it on tv, just remember the main point here: Pick the guy that you would be willing to bet on.

Bayesian True Power Ratings for the NFL

In a recent post, I laid out the framework for developing a Bayesian power ratings model for the NFL using the BUGS/JAGS simulation software. That was a really simple model that essentially amounted to little more than a standard linear regression (or ridge regression). At the end of the article I suggested that one area of improvement would be to take into account turnovers. So, this is my first attempt to do that (at least, the first one that I'm writing about).

Continue reading