Alexander
What's it going to be then, eh?
- Messages
- 62,482
- Reaction score
- 67,294
A Way-Too-Early Prediction of the NFL Season
By Kurt Bullard
Now that it’s late July, football fans everywhere are looking forward to the opening of training camp, preseason, and the not-so-distant start of the regular season. Even though the so-called football experts struggle to forecast the outcomes of the 16-game season, I’ll try to put together a prediction model for the NFL season using a more quantitative method than the likes of Trent Dilfer.
The biggest challenge obviously is to come up with a sound way to estimate team strength, an endeavor that’s demanding considering the amount of personnel turnover each offseason and the lack of advanced statistics to evaluate player interactions. The method that I came up with uses Pro Football Reference’s Approximate Value statistic, the site’s best measure of trying to tease out individual talent. Then, using ESPN’s NFL depth charts, I aggregated each team’s per game approximate value of what I considered to be the “core” makeup of an NFL team: QB, RB, 2 WR, TE, Top 2 OL, the Top-4 “Front Seven” defensive players, and the Top-2 players from the secondary.
There were some exceptions to simply using last year’s AV. If a team had an absent starter that was injured or suspended for the majority of last year (e.g. Adrian Peterson), I used the player’s 2013 AV value. And, if ESPN listed a rookie as a starter, I took the AV of the backup with the reasoning that, if the rookie ends up starting, he should perform at least as good as the person that is backing him up. So, I used the per-game AV of Josh McCown as a substitute for Jameis Winston in my model since predicting rookie performance is another battle of its own. This will inflate the odds for teams who plan to stick with a struggling rookie through thick and thin, and hurt teams who find a phenom rookie.
To make sure this was a sound method, I tested it out on last year’s data and ran a regression to see if AV was predictive of the end-of-regular season Elo ratings as reported by FiveThirtyEight. Aggregated AV was indeed significant with a T-stat of 8.57. It was also a strong predictor of Elo, as the regression returned a .72 R-Squared value.
This model does not account for aging, but I make the assumption that in aggregating these AV totals, the positive and negative effects of aging on an individual will, for a team, net out to around zero. So this model favors aging teams and may hinder up-and-coming teams.
I then converted the aggregated AV for each team into an Elo rating so that I could later use that value to calculate the win probability of each team in each game this season.
With the mean Elo rating set at 1500, I set the possible range of Elo values between 1320-1900, since the standard deviation of Elo ratings has traditionally been 90 points. So, the Raiders, who had the lowest AV aggregate (76.34) were set to 1320, while the Seahawks (166.19) were set to 1680. The rest of the teams were set on the scale based on the following formula: 1320 + (360/(166.19-76.34))*AV. Some familiar teams fall to the bottom, while the Super Bowl favorites Packers and Seahawks floated their way to the top.
But Elo ratings don’t paint the whole picture, as teams who finish with worse records the previous year tend to benefit from easier schedules. I therefore ran a Monte Carlo simulation of each team’s season, calculating win probabilities based on the Elo ratings using the following formula: 1/(10^(Opponent Elo – Elo)/400)+1). Using Benjamin Morris’ conversion table from wins to playoff odds, I then calculated the odds that a team would make the playoffs for the upcoming year. I then normalized it so an average of 12 teams would make the playoffs every year.
http://harvardsportsanalysis.org/2015/07/a-way-too-early-prediction-of-the-nfl-season/
Just goes to show you stats and analytics can make bean counting goobers with slide rules think they can analyze or predict NFL football.
By Kurt Bullard
Now that it’s late July, football fans everywhere are looking forward to the opening of training camp, preseason, and the not-so-distant start of the regular season. Even though the so-called football experts struggle to forecast the outcomes of the 16-game season, I’ll try to put together a prediction model for the NFL season using a more quantitative method than the likes of Trent Dilfer.
The biggest challenge obviously is to come up with a sound way to estimate team strength, an endeavor that’s demanding considering the amount of personnel turnover each offseason and the lack of advanced statistics to evaluate player interactions. The method that I came up with uses Pro Football Reference’s Approximate Value statistic, the site’s best measure of trying to tease out individual talent. Then, using ESPN’s NFL depth charts, I aggregated each team’s per game approximate value of what I considered to be the “core” makeup of an NFL team: QB, RB, 2 WR, TE, Top 2 OL, the Top-4 “Front Seven” defensive players, and the Top-2 players from the secondary.
There were some exceptions to simply using last year’s AV. If a team had an absent starter that was injured or suspended for the majority of last year (e.g. Adrian Peterson), I used the player’s 2013 AV value. And, if ESPN listed a rookie as a starter, I took the AV of the backup with the reasoning that, if the rookie ends up starting, he should perform at least as good as the person that is backing him up. So, I used the per-game AV of Josh McCown as a substitute for Jameis Winston in my model since predicting rookie performance is another battle of its own. This will inflate the odds for teams who plan to stick with a struggling rookie through thick and thin, and hurt teams who find a phenom rookie.
To make sure this was a sound method, I tested it out on last year’s data and ran a regression to see if AV was predictive of the end-of-regular season Elo ratings as reported by FiveThirtyEight. Aggregated AV was indeed significant with a T-stat of 8.57. It was also a strong predictor of Elo, as the regression returned a .72 R-Squared value.
This model does not account for aging, but I make the assumption that in aggregating these AV totals, the positive and negative effects of aging on an individual will, for a team, net out to around zero. So this model favors aging teams and may hinder up-and-coming teams.
I then converted the aggregated AV for each team into an Elo rating so that I could later use that value to calculate the win probability of each team in each game this season.
With the mean Elo rating set at 1500, I set the possible range of Elo values between 1320-1900, since the standard deviation of Elo ratings has traditionally been 90 points. So, the Raiders, who had the lowest AV aggregate (76.34) were set to 1320, while the Seahawks (166.19) were set to 1680. The rest of the teams were set on the scale based on the following formula: 1320 + (360/(166.19-76.34))*AV. Some familiar teams fall to the bottom, while the Super Bowl favorites Packers and Seahawks floated their way to the top.
But Elo ratings don’t paint the whole picture, as teams who finish with worse records the previous year tend to benefit from easier schedules. I therefore ran a Monte Carlo simulation of each team’s season, calculating win probabilities based on the Elo ratings using the following formula: 1/(10^(Opponent Elo – Elo)/400)+1). Using Benjamin Morris’ conversion table from wins to playoff odds, I then calculated the odds that a team would make the playoffs for the upcoming year. I then normalized it so an average of 12 teams would make the playoffs every year.
http://harvardsportsanalysis.org/2015/07/a-way-too-early-prediction-of-the-nfl-season/
Just goes to show you stats and analytics can make bean counting goobers with slide rules think they can analyze or predict NFL football.