In my last post, I explored the impact of age on running back performance . But there are likely many more factors that cause running backs to improve or decline. Today, we’ll dive into some of those additional factors.
(Warning: This post contains a fair amount of math and statistics. If that isn’t your cup of tea or you’re not interested in how I arrived at my conclusions, feel free to skip ahead to the Implications section to dive straight into the fantasy-focused recommendations)
Methodology
As before, I gathered some data from Pro-Football-Reference. My data set included all running backs from the 1980 season through the 2013 season. Once again, I cut this down to only include running backs who averaged more than 10 touches (defined as carries + receptions) per game. The rationale here is that we only want to look at the performance of impact fantasy players.
Next, I determined each player’s production in relation to that player’s career best production. Production here is defined as average rushing yards per game plus + average receiving yards per game. I’ll clarify this with an example. Adrian Peterson’s maximum yardage from scrimmage was 144.6 yards per game in the 2012 season. That year he got a percent of max score of 1. Last year, Adrian Peterson’s total yards from scrimmage per game was 102.6, which is about 70% percent of his maximum in 2012. So, for the 2013 season, AP got a percent of max score of 0.7092.
Taking this approach (rather than simply analyzing their absolute yards per game right away) helps minimize selection bias, as the values for all players can be compared on an even playing field. After getting the percent of max performance scores, I then grouped the data by various factors and averaged each group’s performance.
Data and Analysis
The first factor I analyzed was number of career carries. Here’s the graph:
It’s important to note here that carries were grouped into bins of 200 carries. So the point at 200 carries includes running backs from 1 career carry to 200, the point at 400 carries includes running backs with 201 to 400 career carries. Similarly to age, running backs peaked early (within their first 200 carries), then slowly declined for the rest of their careers.
Next, I looked into career yards. The graph looked very similar to the previous one, as well as the age graph from the last post.
Finally, I looked at career games played:
All three (four if you include age) of these graphs are very similar, and obviously, they are all closely linked. Running backs who play more games tend to receive more carries. Running backs who get more carries are able to push through for more yards. And all of these factors typically increase with age. So I realized that the analysis above wasn’t actually all that useful. What I really needed was a way to isolate the effects of each variable and understand each one’s impact on performance. To do this, I used a multivariable regression. I’m in the process of attaining and learning better software, but for now, I did the analysis in Excel. I started with the following factors:
- Age
- Age^2
- Career Yards
- Career Yards ^ 2
- Career Games Played
- Career Games Played ^ 2
- Career Carries
- Career Carries ^ 2
What’s up with those square terms? Well, most of the above analysis wasn’t particularly useful, but it did tell us one thing: that factors such as age, career yards, etc. are most likely best fitted with a second order polynomial function (i.e. a parabola). Adding the square terms into the regression checks if this is actually the case.
From the above list, I performed a (manual) backwards regression. Several of the factors were not significant at the 0.05 level, so I removed them from the regression one by one. I ended up with this:
Now, the R^2 here is about 0.2, which means that the model explains roughly 20% in the variation in running back production. This may seem low, but we’ve only looked at a handful of factors, so I do think it’s nothing to sniff at. So many factors are responsible for a player’s performance each year, including coaching, injuries, player role, offensive line, quarterback play, weather, strength of schedule, etc.
However, it did give us some valuable knowledge. First, we learned that career yardage and career games played were not found to be significant in explaining running back performance, as those factors dropped out during the regression. Age and career carries were found to be the main driving factors out of the variables analyzed. I thought that even these factors may have been intertwined, compromising my results due to multicollinearity, but I calculated the Variable Inflation Factor to be .196, which is well within the bounds of safe results.
Implications
So we know that age and career carries were significant variables, and we can quantify their impact. According to the regression, we have the following equation:Keeping in mind that this does not at all cover all variables that explain performance (as the R^2 was only 0.2), we can use this formula to calculate a rough “Safety Score” for each running back based on his age and career carries.
Let’s look at ESPN’s top 40 running backs again. Age here is defined as the age the player will be by the end of 2014:
ESPN Rank | Name | Age | Total Carries | Safety Score |
1 | Adrian Peterson | 29 | 2033 | 0.659 |
2 | LeSean McCoy | 26 | 1149 | 0.743 |
3 | Jamaal Charles | 28 | 1043 | 0.748 |
4 | Matt Forte | 29 | 1551 | 0.690 |
5 | Marshawn Lynch | 28 | 1753 | 0.685 |
6 | Eddie Lacy | 23 | 284 | 0.834 |
7 | Doug Martin | 25 | 446 | 0.830 |
8 | Arian Foster | 28 | 1131 | 0.739 |
9 | Zac Stacy | 23 | 250 | 0.840 |
10 | DeMarco Murray | 26 | 542 | 0.820 |
11 | Le’Veon Bell | 22 | 244 | 0.823 |
12 | Alfred Morris | 26 | 611 | 0.810 |
13 | Montee Ball | 24 | 120 | 0.875 |
14 | Giovani Bernard | 23 | 170 | 0.853 |
15 | Reggie Bush | 29 | 1190 | 0.723 |
16 | Ben Tate | 26 | 421 | 0.838 |
17 | Ryan Mathews | 27 | 849 | 0.777 |
18 | C.J. Spiller | 27 | 590 | 0.812 |
19 | Frank Gore | 31 | 2187 | 0.619 |
20 | Andre Ellington | 25 | 118 | 0.884 |
21 | Trent Richardson | 23 | 455 | 0.807 |
22 | Chris Johnson | 29 | 1742 | 0.676 |
23 | Ray Rice | 27 | 1430 | 0.715 |
24 | Steven Jackson | 31 | 2552 | 0.610 |
25 | Rashad Jennings | 29 | 387 | 0.827 |
26 | Shane Vereen | 25 | 121 | 0.883 |
27 | Joique Bell | 28 | 248 | 0.859 |
28 | Stevan Ridley | 25 | 555 | 0.814 |
29 | Bishop Sankey | 22 | 0 | 0.866 |
30 | Pierre Thomas | 30 | 773 | 0.757 |
31 | Knowshon Moreno | 27 | 846 | 0.777 |
32 | Toby Gerhart | 27 | 276 | 0.860 |
33 | Maurice Jones-Drew | 29 | 1804 | 0.672 |
34 | Chris Ivory | 26 | 438 | 0.835 |
35 | Fred Jackson | 33 | 1138 | 0.642 |
36 | Danny Woodhead | 29 | 371 | 0.829 |
37 | Darren Sproles | 31 | 437 | 0.785 |
38 | DeAngelo Williams | 31 | 1370 | 0.671 |
39 | David Wilson | 23 | 115 | 0.863 |
40 | Bernard Pierce | 24 | 260 | 0.851 |
So, based on ONLY age and number of career carries, we can see a few interesting things from this analysis:
- Adrian Peterson looks to be a risky pick. He’s getting rather old, and he’s racked up literally a ton of carries since entering the league. Still, it’s hard to bet against AP, and a down year for him would still likely be elite. Last year, he only played 14 games and only managed 70% of his maximum scrimmage yards and still ended 7th overall among running backs on ESPN.
- McCoy and Charles are neck and neck in Safety Score, as though McCoy is nearly two years younger, he’s carried the ball 100 more times than Charles. Still, I think the Chip Kelly offense combined with the loss of some of Charles offensive lineman makes McCoy my favorite target at the #1 overall draft slot.
- Lynch is quite risky already based on workload and age, and that risk is likely exacerbated by his continued holdout, which as of this writing has still not ended. I would stay away. It’s interesting to note, though, that Forte is just as risky based on these factors.
- DeMarco Murray seems reasonably safe here, but remember we’re not looking at injury history just yet. Ditto for Arian Foster, Ben Tate, Ryan Matthews, and other running backs who have been labeled injury prone.
- Ball should be primed for a good year, as he’s got fresh legs and is at the optimal age for running backs to produce.
- I’m staying far, far away from the older backs, particularly Gore and Jackson. Betting on them to continue performing at a high level is betting against the data.
- Gerhart is a “young” 27. Joique Bell is a “young” 28. Rashad Jennings and Danny Woodhead are “young” 29 year olds. Don’t be afraid of these guys declining for age related reasons, as they haven’t taken much pounding so far in their careers.
Conclusion
These factors only accounted for 20% of the variation in player performance. My goal is to find other factors to get that number higher, hopefully to above 60-70%. I strongly believe quantitative analysis like this can provide a huge edge in fantasy football over playing based on gut instinct. Coming soon, I’ll do a similar analysis on quarterbacks and receivers.
I’ll end with a caveat again. I’m don’t have any high level training in statistics, so if any of you do, and you find any flaws with my methodology, I’d love to hear from you so we can fix them. Likewise, if you have any comments, criticisms, suggestions, requests, corrections, insights, or limericks, please let me know.