Projecting the Draft Through Numbers
Editor’s Note: This project is a collaboration between Jacob Frankel (@jacob_frankel) and Hickory-High’s own Cole Patty (@Cole_Patty). They also received a data-grab assist from Jameson Draper (@JamDraper).
The NBA Draft is one of the most important components in building a team, but how prospects are judged can be subjective, ambiguous, and quite often erroneous. Un-quantifiable terms like “motor”, “upside”, and “athleticism” reign supreme. Very rarely do you hear advanced statistics used as a way to judge a player. Analysts rave over players’ wingspan while leaving Steal Rate by the wayside, despite the former being a slight negative indicator and the latter a clear positive. Attaching labels like “bad character” to guys we’ve seen on TV for 30 hours, because of how hard they dive after loose balls, isn’t always accurate.
In basketball, using statistics for everything is not the “right” thing to do, which old schoolers often believe is a statistician’s goal. Data and the eye test should be used hand in hand, but there isn’t much data to complement the eye test when it comes to the draft. With that in mind, Cole Patty and myself looked at data from drafts of past years and used that data to help us build a predictive model for this year’s draft prospects.
In short, we built the model using a regression with data from drafts of yore. The basics of a regression – we put in a bunch of independent variables and one dependent variable. The regression told us how much of the the variance in that dependent variable was explained by all the independent variables and gave us an equation to find the dependent variable if we have all the independent ones. You may see where this is going.
The independent variables we collected were a players’ advanced statistics (via KenPom.com), his team’s strength of schedule, and his combine measurements. The dependent variable was the players’ regularized adjusted plus/minus for his 4th season in the league (read this for a thorough review of RAPM). We then used the formula provided by the regression to predict the 4th RAPM of this years’ draftees.*
*There were separate regressions for bigs and smalls, with bigs performing better.
Another thing we wanted to incorporate was the expected value of a player picked with a certain pick. What’s better for a franchise, to find a +2.0 player with their second pick or a +1.0 player with their 30th? I went back and looked at RAPM data for players in drafts 1987-2003. Here’s what I found:
We used DraftExpress.com’s mock draft to get an idea of where each player this draft would be picked and then found their marginal projected value: projected value – expected value of expected pick. This can help make it clearer which players are steals. For example, Erik Murphy projects as a -3.5 player in his fourth season, pretty bad. Compared to what a team would expect to get out of a player in the area of the draft he will probably be picked in though, this is pretty good. Thus, he has a high marginal value.
Before we look at this draft’s results lets look at how it fared in previous years. Let’s go back to the first number the regression provided us, called the R-squared, which gives an estimate as to how much of the variance in actual RAPM is explained by the independent variables.
As you can see it does an acceptable job, explaining 57% of the variance. The correlation coefficient is 0.76. This isn’t perfect, but it isn’t a dud either. I also grouped data into bins, to see how good it was in general terms projecting how good (or bad) players would be, but not exact RAPM numbers.
As you can see, it performs very well projecting, in general terms who will be good and who won’t. Now, some actual player names from drafts 2004-2008.
- Top six guys: Chris Paul, Kevin Love, Mike Conley, Andrew Bogut, Michael Beasley, Kevin Durant
- Most underrated: Ronny Turiaf, David Lee, DeAndre Jordan, Deron Williams, Brandon Rush, Alex Acker
- Most overrated: Michael Beasley, Javaris Crittenton, Sean May, Patrick O’Bryant, Rashad McCants, Luke Jackson
So we ran our initial regression results on this year’s data and the results seemed a little off:
First of all, in the other four drafts we tried the method on, only three guys had ratings above +3.0 (Paul, Love, and Conley). For there to be two +3.0 guys (not even including Nerlens Noel) in what is supposedly the weakest draft in years is a little strange. So I dove into the data and found one root cause for the incredibly high ratings: Chris Paul. That’s right, the point god was so damn good in his fourth season that he on his own created a general tendency to overrate.
In Paul’s fourth year he was a +9.0, by far the highest in our dataset (Kevin Love was second with a +6.2) and a massive outlier. So what happens when I just delete the row labeled “Chris Paul”, run the regression again, and plug this class’ numbers into the equation?* At this point we also included non-combine players into our results, with a separate regression that took into account just college advanced statistics and basic measurements. The results of this regression weren’t quite as good, but will still work. The correlation between the actual RAPM and the projected was a solid 0.68. Interestingly enough, the smalls’ regression performed much better than the bigs’ (the opposite of what happened with combine data). This implies that athleticism may be a more significant factor in projecting a big man’s productivity than a small’s.
*During the process of re-doing the results two other things happened. First, we made Otto Porter a small (instead of having him a big because of height). Second, an error was found in Mike Muscala’s data, changing his ranking drastically.
These results look much more reasonable. The marginal value in these charts is for each player’s projected draft slot, from Draftexpress, but things can change. Ian Levy was able to put each player’s RAPM projection into a visualization that shows the range of picks in which each player’s marginal value would make them an attractive selection:
The first thing of note was the high evaluation of a number of point guards. Even the projections of small-school studs at the position such as Ray McCallum, C.J. McCollum, and Nate Wolters compared very favorably against competitors at other positions. That result is partially due to the recent success point guards have achieved in the NBA. This can also be attributed to the fact that currently the data isn’t set up to separate point guards from swing players. Nevertheless, players like McCallum or Wolters could actually turn out to be cunning steals for teams if they fall into the second round (projected to go 41st and 39th, respectively).
That said, a lot of familiar faces from the top of draft boards are sticking around to the top of the results in projected 4th year RAPM. Expected number one overall pick, Nerlens Noel, tested out with the highest projection, followed by fellow expected top-3 pick, Otto Porter. Every other non-point guard projected to have a positive 4th season RAPM, save Kentavious Caldwell-Pope, is tagged as a lottery pick in the DraftExpress mock draft. So, for the most part the model agrees with common knowledge. The largest “bust” as defined by marginal value (other than the -6.2 duo of Tony Snell and Mike Muscala) was Kansas’ Ben McLemore, who is expected by many to be the second pick this year. McLemore isn’t projected to be a terrible NBA player– his -1.3 RAPM projection is above the NBA average– but that kind of production would definitely make him a “disappointment” as a number two pick. That’s a primary reason this class is being labeled as weak. Save Noel, all the expected top picks are projected to produce at a role player level in their 4th seasons.
Given the relative success of this method in the past these results should definitely be taken into account when considering the draft. They aren’t perfect, but they are entirely data driven and provide a nice contrast to the majority of draft content.
Check back later in the week for more content on the draft based on this approach.