The Fifth Factor
Over the past few weeks I’ve shared my work on shot selection and the end result, my new metric Expected Points Per Shot
(XPPS). In all this analysis we have yet to quantitatively identify the real value of this metric. When I built XPPS I was primarily looking for a way to generally assess the quality of a player or team’s shot selection, but as this research has gone on, it has become clear that XPPS can be used to directly and quantitatively measure the actual value to the team of good shot selection. Establishing that relationship means answering these three questions:
- How does XPPS relate to actual shooting percentages?
- How does XPPS relate to overall team offensive efficiency?
- How does the effect of XPPS compare to the effect of other variables on team offensive efficiency?
To go about answering these questions I’ve used regression analysis. This is a technique for estimating the connection between variables. When there are just two variables it measures how changes in one ‘independent variable’ affect the other ‘dependent variable.’ When there are several variables (multiple regression) the same process is repeated over and over again, except the multiple independent variables are all held constant while only one is varied at a time. This lets us see the effect the independent variables have in the aggregate on the dependent variable, but it can also help us identify how much of that effect can be attributed to each variable.
An example of regression analysis in action: home prices. A regression analysis could look at the effect square footage, proximity to a school, property tax rates and acreage have on the price of a home. The regression results would let you know how well those variables together explain variations in home price and how much each variable contributed to those variations. In this case home price would be the dependent variable, while square footage, proximity to a school, property tax rates and acreage would all be independent variables.
The measure of fit in a regression is labeled R^2. This works on a scale of 0.0 to 1.0. An R^2 of 0.0 would mean that the independent variables have no effect on the dependent variable. An R^2 of 1.0 would mean that that independent variables explain 100% of the variation in the dependent variable. Regression analysis also returns an equation with which the dependent variable can be estimated by multiplying the value of each independent variable by a coefficient and then adding the results. For the home price example I listed above, this might look something like this: Home Price = ((square footage x 8.2)+(school proximity x 548.6)+(property tax rate x 761.6)+(acreage x 540.6)).
The data set I used for all these regression analyses was offensive team numbers, including XPPS, going back to the 2000-2001 season.
How does XPPS relate to actual shooting percentages?
The first thing to measure is the relationship between XPPS and actual shooting percentages. Before we dig in I should admit that I was careless in naming this new metric. Expected Points Per Shot includes free throw attempts, so Expected Points Per Scoring Opportunity might have been a more accurate moniker. Because I’ve included free throws, XPPS mimics the formula for True Shooting Percentage (TS%). In that formula ‘true shot attempts’ are calculated by adding field goal attempts to (0.44 x free throw attempts). Free throw attempts are multiplied by 0.44 to adjust for free throws from technical fouls, And-1s and the extra attempt for shooting fouls on three pointers that would inflate the number of equivalent shot attempts. The formula for XPPS divides expected points by that same formula for ‘true shot attempts’.
Because the formula for XPPS so closely mirrors TS% I will start there in measuring the connection between shot selection and accuracy. Using team totals going back to the 2000-2001 season, I regressed XPPS against TS% and returned an R^2 value of 0.241. That means about 24.1% of the variation in a team’s TS% can be explained by their shot selection, as expressed by XPPS. That’s nearly a quarter of shooting accuracy driven purely by the location of shots.
Wrapped up in XPPS, we also have the means to explain the other 75.9% of TS%.
Because XPPS is built on league average values from different shot locations I’ve been comparing it to Actual Points Per Shot (PPS) throughout my analysis. The nature of averages means they are constantly being over and under-performed, and looking at shot-selection is most meaningful when we consider it side-by-side with accuracy. Like I mentioned above, both XPPS and PPS rely on the same underlying formula as TS% – when I calculate PPS, I’m also calculating TS%, just without dividing the result in half.
My analysis has also included something I’ve referred to simply as ‘difference’. This is Actual PPS – XPPS, and represents how teams or players have under or over-performed compared to league averages. This difference could also be called ‘Shot-Making Difference’. Added to XPPS, it equals PPS, which correlates exactly to TS%. Shot-Making Difference is driven by variables much more difficult to measure like a player’s own abilities, game situations, defensive schemes, etc. So while, XPPS explains 24.1% of the variation in shooting accuracy (as expressed by TS%), Shot-Making Difference explains the other 75.9%.
How does XPPS relate to overall offensive efficiency?
Now that we’ve established the connection between XPPS and TS%, and know that the rest of TS% can be explained by Shot-Making Differential, we can look at how this connects to the big picture – overall offensive efficiency. Usually when we talk about offensive efficiency we use the metric Offensive Rating (ORTG), which is points scored per 100 possessions. The discussion moves from there to the Four Factors that Dean Oliver identified in his book, Basketball on Paper. The Four Factors—Effective Field Goal Percentage (eFG%), Turnover Percentage (TO%), Offensive Rebound Percentage (ORB%) and Free Throw Rate (FTA/FGA)—explain over 99% of a team’s offensive efficiency. However, those four don’t suit my purposes here, because they separate scoring that occurs from the field and scoring that occurs at the free throw line into two separate categories. As I mentioned above, XPPS incorporates both into one metric. For that reason I used TS% in place of eFG% and FTA/FGA. To be sure that using these three variables instead of the traditional Four Factors still captured the essence of offensive efficiency, I regressed TS%, TO% and ORB% together against ORTG, returning an R^2 value of 0.996. Those three factors explain offensive efficiency just as well as the Four Factors do.
Now that we know TS% can be explained completely by adding XPPS and Shot-Making Differential, we can replace TS% in the regression above with those two variables and return the exact same R^2 value, 0.996. Over 99% of the variation in a team’s ORTG can be explained by TO%, ORB%, Shot-Making Difference and XPPS. From here we just need a strategy to identify what portion of that 99% can be attributed to XPPS.
The easiest thing to do would be just looking at the size of the coefficients and assume that is the weight for each variable. Unfortunately, that doesn’t work. Take my completely made-up Home Price example from earlier.
Home Price = ((square footage x 8.2)+(school proximity x 548.6)+(property tax rate x 761.6)+(acreage x 540.6))
There are two reasons that just looking at the size of the coefficients doesn’t work. The first is that each variable is working on a different scale. Although property tax rate has the largest coefficient, property tax rates themselves are usually numbers in the single digits. That means that the product of that piece of the equation will likely be considerably smaller than square footage portion of that equation. The second is that there may be much more variance (a wider spread of values) in one of those variables which means there is more potential to affect the output.
Returning to our basketball work, we find the same problems. The value of the ORB% variable in our preliminary regression is somewhere between 0.22 and 0.30, whereas XPPS is somewhere between 1.000 and 1.500. However there is a way to level the playing field among all the variables so that we can compare the coefficients in a meaningful way. We do that by running the regression on standardized variables instead of the variables themselves. In standardization, we replace each XPPS value with the number of standard deviations it is away from its overall average. We do the same for ORB%, TO% and Shot-Making Difference, along with the dependent variable (Offensive Rating). This eliminates problems with variance and puts all the variables on the same scale.
We have one other problem to deal with here. The independent variables in this regression, TO%, ORB%, Shot-Making Difference and XPPS, are not truly independent. For example, Shot-Making Difference varies along with ORB%, which makes apportioning responsibility for ORTG difficult. Squaring our now standardized coefficients returned by the regression analysis should tell us the percentage of the dependent variable each independent variable is responsible for. However, when we square the coefficients for this analysis and then add them, we get 108.1% – evidence of some (though not a lot of) overlap. To compensate for this we can adjust the results to the total R^2 of the regression analysis, using this equation: ((XPPS coefficient ^2)*(R^2)/(108.1%)). This particular equation returns the result of 19.2%, the amount of ORTG we estimate is explained by XPPS.
How does the effect of XPPS compare to the effect of other variables on team offensive efficiency?
We’ve identified that XPPS explains about 19.2% of a team’s variation in ORTG, but how does that compare to the influence of the other variables? With the same equation detailed above we find these are the percentages of influence we can attribute to each variable:
- Shot-Making Differential: 55.4%
- XPPS: 19.2%
- TO%: 14.7%
- ORB%: 10.3%
If you prefer your data in graphic form:
By these numbers we can estimate that shot-selection explains about 19.2% of a team’s offensive rating, more than TO% or ORB%. This is also numeric proof that XPPS is a meaningful tool for understanding offensive efficiency and that I didn’t just create a statistic for the sole purpose of discrediting Jordan Crawford as a useful basketball player.
There’s also one last important point here. What I’ve done is essentially just a re-working of Dean Oliver’s Four Factors, with a variation on a method Evan Zamir used a few years ago to identify the weights of the Four Factors on point differential. However, by using eFG% and FTA/FGA, Oliver’s original work mostly lumps shot-selection and shooting accuracy together, with the highlighting of free throws as the only real crumb thrown to shot-selection. This alternate view of the Four Factors I’ve created presents shooting accuracy in a different format than we’re used to, splitting it between XPPS and Shot-Making Difference, but it has the benefit of highlighting shot-selection, a factor too often obscured or ignored when it comes to offensive analysis. Slicing the same pie in a different way, gives you a different view of the inside.
* I’d like to give a huge thank you to Daniel Myers for his endless, patient help in putting this analysis together.