Additional Factors Impacting Running Back Performance

In my last post, I explored the impact of age on running back performance . But there are likely many more factors that cause running backs to improve or decline. Today, we’ll dive into some of those additional factors.

(Warning: This post contains a fair amount of math and statistics. If that isn’t your cup of tea or you’re not interested in how I arrived at my conclusions, feel free to skip ahead to the Implications section to dive straight into the fantasy-focused recommendations)

Methodology

As before, I gathered some data from Pro-Football-Reference. My data set included all running backs from the 1980 season through the 2013 season. Once again, I cut this down to only include running backs who averaged more than 10 touches (defined as carries + receptions) per game. The rationale here is that we only want to look at the performance of impact fantasy players.

Next, I determined each player’s production in relation to that player’s career best production. Production here is defined as average rushing yards per game plus + average receiving yards per game. I’ll clarify this with an example. Adrian Peterson’s maximum yardage from scrimmage was 144.6 yards per game in the 2012 season. That year he got a percent of max score of 1. Last year, Adrian Peterson’s total yards from scrimmage per game was 102.6, which is about 70% percent of his maximum in 2012. So, for the 2013 season, AP got a percent of max score of 0.7092.

Taking this approach (rather than simply analyzing their absolute yards per game right away) helps minimize selection bias, as the values for all players can be compared on an even playing field. After getting the percent of max performance scores, I then grouped the data by various factors and averaged each group’s performance.

Data and Analysis

The first factor I analyzed was number of career carries. Here’s the graph:1st

It’s important to note here that carries were grouped into bins of 200 carries. So the point at 200 carries includes running backs from 1 career carry to 200, the point at 400 carries includes running backs with 201 to 400 career carries. Similarly to age, running backs peaked early (within their first 200 carries), then slowly declined for the rest of their careers.

Next, I looked into career yards. The graph looked very similar to the previous one, as well as the age graph from the last post.2nd

Finally, I looked at career games played:3rd

All three (four if you include age) of these graphs are very similar, and obviously, they are all closely linked. Running backs who play more games tend to receive more carries. Running backs who get more carries are able to push through for more yards. And all of these factors typically increase with age. So I realized that the analysis above wasn’t actually all that useful. What I really needed was a way to isolate the effects of each variable and understand each one’s impact on performance. To do this, I used a multivariable regression. I’m in the process of attaining and learning better software, but for now, I did the analysis in Excel. I started with the following factors:

  • Age
  • Age^2
  • Career Yards
  • Career Yards ^ 2
  • Career Games Played
  • Career Games Played ^ 2
  • Career Carries
  • Career Carries ^ 2

What’s up with those square terms? Well, most of the above analysis wasn’t particularly useful, but it did tell us one thing: that factors such as age, career yards, etc. are most likely best fitted with a second order polynomial function (i.e. a parabola). Adding the square terms into the regression checks if this is actually the case.

From the above list, I performed a (manual) backwards regression. Several of the factors were not significant at the 0.05 level, so I removed them from the regression one by one. I ended up with this:

5thNow, the R^2 here is about 0.2, which means that the model explains roughly 20% in the variation in running back production. This may seem low, but we’ve only looked at a handful of factors, so I do think it’s nothing to sniff at. So many factors are responsible for a player’s performance each year, including coaching, injuries, player role, offensive line, quarterback play, weather, strength of schedule, etc.

However, it did give us some valuable knowledge. First, we learned that career yardage and career games played were not found to be significant in explaining running back performance, as those factors dropped out during the regression. Age and career carries were found to be the main driving factors out of the variables analyzed. I thought that even these factors may have been intertwined, compromising my results due to multicollinearity, but I calculated the Variable Inflation Factor to be .196, which is well within the bounds of safe results.

Implications

So we know that age and career carries were significant variables, and we can quantify their impact. According to the regression, we have the following equation:6thKeeping in mind that this does not at all cover all variables that explain performance (as the R^2 was only 0.2), we can use this formula to calculate a rough “Safety Score” for each running back based on his age and career carries.

Let’s look at ESPN’s top 40 running backs again. Age here is defined as the age the player will be by the end of 2014:

ESPN Rank Name Age Total Carries Safety Score
1 Adrian Peterson 29 2033 0.659
2 LeSean McCoy 26 1149 0.743
3 Jamaal Charles 28 1043 0.748
4 Matt Forte 29 1551 0.690
5 Marshawn Lynch 28 1753 0.685
6 Eddie Lacy 23 284 0.834
7 Doug Martin 25 446 0.830
8 Arian Foster 28 1131 0.739
9 Zac Stacy 23 250 0.840
10 DeMarco Murray 26 542 0.820
11 Le’Veon Bell 22 244 0.823
12 Alfred Morris 26 611 0.810
13 Montee Ball 24 120 0.875
14 Giovani Bernard 23 170 0.853
15 Reggie Bush 29 1190 0.723
16 Ben Tate 26 421 0.838
17 Ryan Mathews 27 849 0.777
18 C.J. Spiller 27 590 0.812
19 Frank Gore 31 2187 0.619
20 Andre Ellington 25 118 0.884
21 Trent Richardson 23 455 0.807
22 Chris Johnson 29 1742 0.676
23 Ray Rice 27 1430 0.715
24 Steven Jackson 31 2552 0.610
25 Rashad Jennings 29 387 0.827
26 Shane Vereen 25 121 0.883
27 Joique Bell 28 248 0.859
28 Stevan Ridley 25 555 0.814
29 Bishop Sankey 22 0 0.866
30 Pierre Thomas 30 773 0.757
31 Knowshon Moreno 27 846 0.777
32 Toby Gerhart 27 276 0.860
33 Maurice Jones-Drew 29 1804 0.672
34 Chris Ivory 26 438 0.835
35 Fred Jackson 33 1138 0.642
36 Danny Woodhead 29 371 0.829
37 Darren Sproles 31 437 0.785
38 DeAngelo Williams 31 1370 0.671
39 David Wilson 23 115 0.863
40 Bernard Pierce 24 260 0.851

So, based on ONLY age and number of career carries, we can see a few interesting things from this analysis:

  • Adrian Peterson looks to be a risky pick. He’s getting rather old, and he’s racked up literally a ton of carries since entering the league. Still, it’s hard to bet against AP, and a down year for him would still likely be elite. Last year, he only played 14 games and only managed 70% of his maximum scrimmage yards and still ended 7th overall among running backs on ESPN.
  • McCoy and Charles are neck and neck in Safety Score, as though McCoy is nearly two years younger, he’s carried the ball 100 more times than Charles. Still, I think the Chip Kelly offense combined with the loss of some of Charles offensive lineman makes McCoy my favorite target at the #1 overall draft slot.
  • Lynch is quite risky already based on workload and age, and that risk is likely exacerbated by his continued holdout, which as of this writing has still not ended. I would stay away. It’s interesting to note, though, that Forte is just as risky based on these factors.
  • DeMarco Murray seems reasonably safe here, but remember we’re not looking at injury history just yet. Ditto for Arian Foster, Ben Tate, Ryan Matthews, and other running backs who have been labeled injury prone.
  • Ball should be primed for a good year, as he’s got fresh legs and is at the optimal age for running backs to produce.
  • I’m staying far, far away from the older backs, particularly Gore and Jackson. Betting on them to continue performing at a high level is betting against the data.
  • Gerhart is a “young” 27. Joique Bell is a “young” 28. Rashad Jennings and Danny Woodhead are “young” 29 year olds. Don’t be afraid of these guys declining for age related reasons, as they haven’t taken much pounding so far in their careers.

Conclusion

These factors only accounted for 20% of the variation in player performance. My goal is to find other factors to get that number higher, hopefully to above 60-70%. I strongly believe quantitative analysis like this can provide a huge edge in fantasy football over playing based on gut instinct. Coming soon, I’ll do a similar analysis on quarterbacks and receivers.

I’ll end with a caveat again. I’m don’t have any high level training in statistics, so if any of you do, and you find any flaws with my methodology, I’d love to hear from you so we can fix them. Likewise, if you have any comments, criticisms, suggestions, requests, corrections, insights, or limericks, please let me know.

Advertisements