Additional Factors Impacting Running Back Performance

In my last post, I explored the impact of age on running back performance . But there are likely many more factors that cause running backs to improve or decline. Today, we’ll dive into some of those additional factors.

(Warning: This post contains a fair amount of math and statistics. If that isn’t your cup of tea or you’re not interested in how I arrived at my conclusions, feel free to skip ahead to the Implications section to dive straight into the fantasy-focused recommendations)

Methodology

As before, I gathered some data from Pro-Football-Reference. My data set included all running backs from the 1980 season through the 2013 season. Once again, I cut this down to only include running backs who averaged more than 10 touches (defined as carries + receptions) per game. The rationale here is that we only want to look at the performance of impact fantasy players.

Next, I determined each player’s production in relation to that player’s career best production. Production here is defined as average rushing yards per game plus + average receiving yards per game. I’ll clarify this with an example. Adrian Peterson’s maximum yardage from scrimmage was 144.6 yards per game in the 2012 season. That year he got a percent of max score of 1. Last year, Adrian Peterson’s total yards from scrimmage per game was 102.6, which is about 70% percent of his maximum in 2012. So, for the 2013 season, AP got a percent of max score of 0.7092.

Taking this approach (rather than simply analyzing their absolute yards per game right away) helps minimize selection bias, as the values for all players can be compared on an even playing field. After getting the percent of max performance scores, I then grouped the data by various factors and averaged each group’s performance.

Data and Analysis

The first factor I analyzed was number of career carries. Here’s the graph:1st

It’s important to note here that carries were grouped into bins of 200 carries. So the point at 200 carries includes running backs from 1 career carry to 200, the point at 400 carries includes running backs with 201 to 400 career carries. Similarly to age, running backs peaked early (within their first 200 carries), then slowly declined for the rest of their careers.

Next, I looked into career yards. The graph looked very similar to the previous one, as well as the age graph from the last post.2nd

Finally, I looked at career games played:3rd

All three (four if you include age) of these graphs are very similar, and obviously, they are all closely linked. Running backs who play more games tend to receive more carries. Running backs who get more carries are able to push through for more yards. And all of these factors typically increase with age. So I realized that the analysis above wasn’t actually all that useful. What I really needed was a way to isolate the effects of each variable and understand each one’s impact on performance. To do this, I used a multivariable regression. I’m in the process of attaining and learning better software, but for now, I did the analysis in Excel. I started with the following factors:

  • Age
  • Age^2
  • Career Yards
  • Career Yards ^ 2
  • Career Games Played
  • Career Games Played ^ 2
  • Career Carries
  • Career Carries ^ 2

What’s up with those square terms? Well, most of the above analysis wasn’t particularly useful, but it did tell us one thing: that factors such as age, career yards, etc. are most likely best fitted with a second order polynomial function (i.e. a parabola). Adding the square terms into the regression checks if this is actually the case.

From the above list, I performed a (manual) backwards regression. Several of the factors were not significant at the 0.05 level, so I removed them from the regression one by one. I ended up with this:

5thNow, the R^2 here is about 0.2, which means that the model explains roughly 20% in the variation in running back production. This may seem low, but we’ve only looked at a handful of factors, so I do think it’s nothing to sniff at. So many factors are responsible for a player’s performance each year, including coaching, injuries, player role, offensive line, quarterback play, weather, strength of schedule, etc.

However, it did give us some valuable knowledge. First, we learned that career yardage and career games played were not found to be significant in explaining running back performance, as those factors dropped out during the regression. Age and career carries were found to be the main driving factors out of the variables analyzed. I thought that even these factors may have been intertwined, compromising my results due to multicollinearity, but I calculated the Variable Inflation Factor to be .196, which is well within the bounds of safe results.

Implications

So we know that age and career carries were significant variables, and we can quantify their impact. According to the regression, we have the following equation:6thKeeping in mind that this does not at all cover all variables that explain performance (as the R^2 was only 0.2), we can use this formula to calculate a rough “Safety Score” for each running back based on his age and career carries.

Let’s look at ESPN’s top 40 running backs again. Age here is defined as the age the player will be by the end of 2014:

ESPN Rank Name Age Total Carries Safety Score
1 Adrian Peterson 29 2033 0.659
2 LeSean McCoy 26 1149 0.743
3 Jamaal Charles 28 1043 0.748
4 Matt Forte 29 1551 0.690
5 Marshawn Lynch 28 1753 0.685
6 Eddie Lacy 23 284 0.834
7 Doug Martin 25 446 0.830
8 Arian Foster 28 1131 0.739
9 Zac Stacy 23 250 0.840
10 DeMarco Murray 26 542 0.820
11 Le’Veon Bell 22 244 0.823
12 Alfred Morris 26 611 0.810
13 Montee Ball 24 120 0.875
14 Giovani Bernard 23 170 0.853
15 Reggie Bush 29 1190 0.723
16 Ben Tate 26 421 0.838
17 Ryan Mathews 27 849 0.777
18 C.J. Spiller 27 590 0.812
19 Frank Gore 31 2187 0.619
20 Andre Ellington 25 118 0.884
21 Trent Richardson 23 455 0.807
22 Chris Johnson 29 1742 0.676
23 Ray Rice 27 1430 0.715
24 Steven Jackson 31 2552 0.610
25 Rashad Jennings 29 387 0.827
26 Shane Vereen 25 121 0.883
27 Joique Bell 28 248 0.859
28 Stevan Ridley 25 555 0.814
29 Bishop Sankey 22 0 0.866
30 Pierre Thomas 30 773 0.757
31 Knowshon Moreno 27 846 0.777
32 Toby Gerhart 27 276 0.860
33 Maurice Jones-Drew 29 1804 0.672
34 Chris Ivory 26 438 0.835
35 Fred Jackson 33 1138 0.642
36 Danny Woodhead 29 371 0.829
37 Darren Sproles 31 437 0.785
38 DeAngelo Williams 31 1370 0.671
39 David Wilson 23 115 0.863
40 Bernard Pierce 24 260 0.851

So, based on ONLY age and number of career carries, we can see a few interesting things from this analysis:

  • Adrian Peterson looks to be a risky pick. He’s getting rather old, and he’s racked up literally a ton of carries since entering the league. Still, it’s hard to bet against AP, and a down year for him would still likely be elite. Last year, he only played 14 games and only managed 70% of his maximum scrimmage yards and still ended 7th overall among running backs on ESPN.
  • McCoy and Charles are neck and neck in Safety Score, as though McCoy is nearly two years younger, he’s carried the ball 100 more times than Charles. Still, I think the Chip Kelly offense combined with the loss of some of Charles offensive lineman makes McCoy my favorite target at the #1 overall draft slot.
  • Lynch is quite risky already based on workload and age, and that risk is likely exacerbated by his continued holdout, which as of this writing has still not ended. I would stay away. It’s interesting to note, though, that Forte is just as risky based on these factors.
  • DeMarco Murray seems reasonably safe here, but remember we’re not looking at injury history just yet. Ditto for Arian Foster, Ben Tate, Ryan Matthews, and other running backs who have been labeled injury prone.
  • Ball should be primed for a good year, as he’s got fresh legs and is at the optimal age for running backs to produce.
  • I’m staying far, far away from the older backs, particularly Gore and Jackson. Betting on them to continue performing at a high level is betting against the data.
  • Gerhart is a “young” 27. Joique Bell is a “young” 28. Rashad Jennings and Danny Woodhead are “young” 29 year olds. Don’t be afraid of these guys declining for age related reasons, as they haven’t taken much pounding so far in their careers.

Conclusion

These factors only accounted for 20% of the variation in player performance. My goal is to find other factors to get that number higher, hopefully to above 60-70%. I strongly believe quantitative analysis like this can provide a huge edge in fantasy football over playing based on gut instinct. Coming soon, I’ll do a similar analysis on quarterbacks and receivers.

I’ll end with a caveat again. I’m don’t have any high level training in statistics, so if any of you do, and you find any flaws with my methodology, I’d love to hear from you so we can fix them. Likewise, if you have any comments, criticisms, suggestions, requests, corrections, insights, or limericks, please let me know.

The Effect of Age on Running Back Performance

Welcome to the inaugural post of The Bar Stool GM, a blog that will cover sports from a data-driven and statistical perspective. As it’s fantasy football season, I wanted to start off with some analysis to help you get an upper hand on your league mates. This first series of posts will cover age and age related factors, and their impacts on player performance. While it’s meant to offer an insight into the fantasy universe, I believe much of it can be applied to the real world as well.

Methodology

NFL Players have very short shelf lives, and this has very real implications on your fantasy team. To understand exactly what these were, I gathered some data from Pro-Football-Reference and analyzed it to see how player production varied with age. Today, we’ll walk through what’s commonly thought to be fantasy football’s most important position: running backs.

My data set included all running backs from the 1970 season through the 2013 season. I cut this down to only include running backs who averaged more than 10 touches (defined as carries + receptions) per game. Admittedly, this is an arbitrary cutoff, but I wanted to ensure that we are only looking at the fantasy relevant players; we aren’t really interested in the career progression of the Roy Helus of the world, after all.

Next, I determined each player’s production in relation to that player’s career best production. Production is defined here as average rushing yards per game plus + average receiving yards per game. Then I grouped the players by age and averaged their productivity as a percentage of their career best productivity. Taking this relative approach (rather than simply looking at absolute product) helps minimize selection bias, as the values are normalized and ensure that players are compared on an even playing field (no pun intended).

Data and Analysis

Here’s the first graph of my findings:

First

The first takeaway from this graph is that running backs seem to peak at age 24, and have a “prime” from ages 23 to 26. After that, the decline is sharp and consistent. Don’t worry about the odd peak at age 35. The sample size for players aged 34, 35, and 36 is extremely small (9, 3, and 2 players , respectively to be exact, while most other ages have over 50 players with qualifying seasons at that age), so I would ignore that portion of the graph. These findings are rather intriguing, as they imply that running backs peak a little bit earlier than conventional wisdom believe.

But that’s scrimmage yards. We, as fantasy players, are most interested in fantasy points, right? For the below graph, I used the same methodology as above, but instead used fantasy points per game rather than average total yards per game, where fantasy points per game = total yards per game* .1 + touchdowns per game* 6 + fumbles per game * -2 (the standard scoring system used in most leagues). Let’s see how fantasy points vary with running back age:

Second

Nope, that’s not a mistake. The two graphs are almost identical: a quick rise to the peak performance age of 24, a two year window of steady performance, followed by a precipitous drop after age 26.

You may have noticed by now that all of this analysis is done on a per-game basis. I did this to control for injuries, but it is of course very important to explore injury risk as well. I thought initially that older players would be more likely to get hurt than younger players, and so would play fewer games:

Third

However, there doesn’t appear to be much of a relationship between age and the number of games played for NFL running backs (correlation = -0.14). This is pretty unfortunate, as injuries are some of the most volatile and unpredictable parts of fantasy football; it’d be really useful to have some insight regarding what causes them.

Implications

What does all this mean for your fantasy team? It means to place a premium on younger players and discount the values of older players. But you probably knew that already, right?The biggest advantage you can get from this analysis is with those players who are between 27 and 29 years of age. Commonly thought to be in their physical primes, the data actually suggests that these players are actually at significant risk of decline. Yes, this includes the likes of the 29-year-old Adrian Peterson (though admittedly, the model does only apply to human beings).

For easy reference, here are the top 40 running backs based on the ESPN’s Rankings, as of 7/26/2014, and what age they will be by the end of the 2014 regular season:

Name Age Name Age
Adrian Peterson 29 Trent Richardson 23
LeSean McCoy 26 Chris Johnson 29
Jamaal Charles 28 Ray Rice 27
Matt Forte 29 Steven Jackson 31
Marshawn Lynch 28 Rashad Jennings 29
Eddie Lacy 23 Shane Vereen 25
Doug Martin 25 Joique Bell 28
Arian Foster 28 Stevan Ridley 25
Zac Stacy 23 Bishop Sankey 22
DeMarco Murray 26 Pierre Thomas 30
Le’Veon Bell 22 Knowshon Moreno 27
Alfred Morris 26 Toby Gerhart 27
Montee Ball 24 Maurice Jones-Drew 29
Giovani Bernard 23 Chris Ivory 26
Reggie Bush 29 Fred Jackson 33
Ben Tate 26 Danny Woodhead 29
Ryan Mathews 27 Darren Sproles 31
C.J. Spiller 27 DeAngelo Williams 31
Frank Gore 31 David Wilson 23
Andre Ellington 25 Bernard Pierce 24

Green indicates a young player poised for improvement, yellow indicates that the player is in his prime, and red indicates the player is over the hill.

Of course, the implications of this analysis are not limited to the fantasy universe. The Chiefs just gave the 27-year-old Jamaal Charles a massive extension that will keep him with the team through the 2017 season. He’ll be 30 at the time, and given this analysis, he’ll be getting worse each year until that time. I haven’t estimated the value of a win yet, and Charles has certainly played at an elite level the past few years, but on the surface, this contract seems questionable at best.

Conclusion

While the curve above may apply to the population of running backs as a whole, any individual running back can easily buck the trend and have a career year at any age (I’m looking at you, 32 year old Fred Jackson!) One potential reason Jackson is having such success late in his career is that he came into the NFL when he was 26. He’s got fewer yards on him than other running backs his age. In my next post, I’ll look at some of these other factors that may explain running back performance, namely yardage, touches, and years in the league. After that, I hope to do analysis on quarterbacks and pass-catchers before moving on to other topics. I hope you enjoyed my first post; any feedback would be appreciated!