Outliers, YPC, and the Cowboys running game

I don't know if any statistic can show that how you move the ball is important. It is THAT you move the ball that matters.
 
No, it wouldn't. You've got a distribution that is non-normal and will always be so as runs of 5+ yards are fairly common but the loss of 5 or more yards is rare.

All looking at SD would tell us is whether someone failed to understand what they were taught in Stat 101

You are correct in that the distribution is non-normal (non-gaussian) as a back can have many 10 -20 yard runs and likely 0 -10 to -20. The positive (right) skew can still be calculated for any area under the curve you want to see : 1 sigma, 3 sigma, etc.

The below is what a running backs likely histogram would represent (with a few <0)

vonhippel_figure1.gif


Still, this is why football is really difficult to find causal stats. I would think you do have to normalize for situational factors, =- goalline TDs, successful 3rd and 1s, turnovers, going down in bounds to keep a clock running, etc.

I am not a big fan of taking away tail events. You pay talent on the ability to be provide returns above "above the mean/median." You could take away 7 of Dez's Tds and all of a sudden he looks middling. What would be interesting is if there were backs that were bi=modal (maybe sanders in his day). You have a block of runs -3 to 1 yards and then maybe not much between 1-3yards and then anothe bump from 4-7 yards. Bi modal or cauchy distribution may exist and really throws this whole premise off.
 
You are correct in that the distribution is non-normal (non-gaussian) as a back can have many 10 -20 yard runs and likely 0 -10 to -20. The positive (right) skew can still be calculated for any area under the curve you want to see : 1 sigma, 3 sigma, etc.

The below is what a running backs likely histogram would represent (with a few <0)

Still, this is why football is really difficult to find causal stats. I would think you do have to normalize for situational factors, =- goalline TDs, successful 3rd and 1s, turnovers, going down in bounds to keep a clock running, etc.

I am not a big fan of taking away tail events. You pay talent on the ability to be "above the mean/median." You could take away 7 of Dez's Tds and all of a sudden he looks middling. What would be interesting is if there were backs that were bi=modal (maybe sanders in his day). You have a block of runs -3 to 1 yards and then maybe not much between 1-3yards and then anothe bump from 4-7 yards. Bi modal or cauchy distribution may exist and really throws this whole premise off.

I don't know enough stats to know whether what you just said was all accurate or if you used a lot of concepts into a word salad that make no sense and thus made me feel stupid.

In which case I am going to react emotionally and say I dont care what you say, I know what my eyeballs tell me, and that's that the green pants are ugly and should be silver.

Good day sir.
 
it is virtually impossible to come to any conclusions with any real definity. There are so many variables. Hundreds of them on every play. Thousands of them in every game.

It is like predicting the weather more than 48 hours ahead of time. There is a reason those kinds of forecasts are never all that accurate and its not because the people doing them are THAT incompetent.
 
I don't know enough stats to know whether what you just said was all accurate or if you used a lot of concepts into a word salad that make no sense and thus made me feel stupid.

In which case I am going to react emotionally and say I dont care what you say, I know what my eyeballs tell me, and that's that the green pants are ugly and should be silver.

Good day sir.

Lol. Sorry, I wasnt trying to be pedantic or haughty. I was basically agreeing with you. Because there is a natural backstop (i.e. a back rarely gets tackled for a loss compared to the >0 carries), the tails are important to measure. That said, maybe there is value in normalizing data and measuring the area (std deviation between two points (i.e. 2-5 yards vs 3-6 yards, etc)
 
Lol. Sorry, I wasnt trying to be pedantic or haughty. I was basically agreeing with you. Because there is a natural backstop (i.e. a back rarely gets tackled for a loss compared to the >0 carries), the tails are important to measure. That said, maybe there is value in normalizing data and measuring the area (std deviation between two points (i.e. 2-5 yards vs 3-6 yards, etc)

Ah, got it. Yeah, everyone is always looking for that one number/metric that proves overall quality.

My guess is that the Cowboys care less about something like YPC and focus more on something like "frequency of runs of <0, .1-2.9, 3-5 yards, and 5+"

Basically, more times than not, does my RB get me 3+ yards? In our current system I can work with that. Add a dimension on Down and maybe a couple of situations, and you got yourself a stew.
 
Lots to go over here.


There's a couple of things I've noticed about tailbacks statistically. Usually their standard deviation in their runs is roughly equal to their yards per carry. So, if a tailback averages 4.0 yards per carry, their standard deviation is going to be close to 4.0 yards as well. Meaning, 68% of their runs will be for 0 to 8 yards.

The other thing is that roughly half of their carries will be 50% of their ypc. So again, if a tailback averages 4.0 yards per carry, roughly 50% of those carries will be for 2 yards. That's why the stats show on 3rd and 1 and 3rd and 2 you're better off running than throwing more often than not. If you have a decent tailback, you should be able to get at least 2 yards 50% of the time. However on 3rd and 3, that's a riskier proposition.

The problem is that ypc leaves a lot to be desired. It doesn't account for strength of run defense that you're running against and more importantly, doesn't factor in down and distance. If I have a 3rd and goal from the 2 yard line against the best run defense in the league and I get the TD, it counts as a 2 yard run. But, I accomplished my goal, particularly against the best run defense in the league.

That's why I prefer Football Outsiders' methodology which factors in those areas. They had Murray as the #1 RB for the season. On a per play basis, he was #5 behind Marshawn Lynch, Jamaal Charles, Lamar Miller and CJ Anderson.

But, out of those 4 players, only Lynch had more than 250 carries on the season. So one could assume that if Charles, Miller and Anderson had gotten to even just 300 carries, their per-play efficiency would have dropped and possibly below Murray's per-play efficiency.

The other factor is that by running the ball so much, especially in the 1st and 3rd quarters, it has historically led to less injuries on defense. That's what the Cowboys did last year and while the defense didn't go unscathed, it was certainly a much healthier defense in the previous years under Garrett. The potential solution to the problem is that we could split the carries more and hope that by keeping the RB's fresh than they can match Murrays total year production and per-play efficiency, but that's still a tall order.

With all that being said, the odds are that Murray is not going to perform at the same level. In fact, the drop-off is likely to be considerable. I know Brian Burke eschews the 380+ carry notion, but using Eddie George as an example of a guy that gained 1,000 yards after a 380+ carry when he averaged 3.3 yards per carry isn't exactly the vote of confidence I was looking for.

So the reality is that we are almost assuredly to see a substantial drop-off in the running game this season even if Murray did stay. And we would basically be paying $8 million+ a year for a tailback that may play more like a $4 million a year tailback.

The real question is if we can get the current crop of tailbacks to perform better than say a $4 million tailback (assuming that's where the drop-off would be had Murray stayed with the team).

The other overlooked factor is can we replace Murray's production as a receiver out of the backfield. I think that was one of the more important changes the offense made in 2014...they stopped using Witten on pivot routes because he can't avoid and break tackles and instead used Beasley and Murray more out of the backfield to turn those 4 yard passes into 8 yard gains or first downs.






YR

I dont have the data, but because of skew, normal standard deviation doesnt work. I would bet 0-8% is MUCH higher than 68%

that said, agree with most other points
 
Again, when you look at Dallas percentage of run plays in previous seasons under Garrett compared to this past season there was a substantial improvement in defensive players being injured. Sure, it was still not very good, but it was *markedly better* than it was in previous seasons under Garrett and the team ran more.

You seem to not understand how *mathematical correlations* work. When I have ran these correlations over the past 10 seasons LEAGUE WIDE and the math shows that there is a strong correlation between rushing plays in the 1st and 3rd quarters and defensive injuries that is more than a large enough sample size to prove that there is a linear relationship between the two variables.

So when you combine the MATH of the correlation between LEAGUE WIDE defensive player injuries based on running the ball more and the Cowboys improving on their injuries when running the ball more and the simple fact that the more plays on defense means more opportunities to get injured...it doesn't take Sherlock Holmes to figure out the role that running the ball played in the Cowboys *improved* health.






YR

did you calculate it on a relative or absolute basis?
 
It does, an it's not, because the advanced stats show that running games mean little to winning and losing in the NFL. So people who think otherwise just ignore them.

I was talking about a WAR like stat were they could assign a dollar value to a player's production.

If Mike Trout of the LA Angels has a 6.0 WAR and each Win is worth 8m in salary, you can say he is worth 48m.

His WAR can be accumulated through Hitting, Defense and Baserunning, just as a RB can accumulate WAR through Rushing, Receiving and TDs.

I've seen Pro Football Reference use a stat called Approximate Value, but I don't know how they come up with it.
 
Last edited:
You are correct in that the distribution is non-normal (non-gaussian) as a back can have many 10 -20 yard runs and likely 0 -10 to -20. The positive (right) skew can still be calculated for any area under the curve you want to see : 1 sigma, 3 sigma, etc.

The below is what a running backs likely histogram would represent (with a few <0)

vonhippel_figure1.gif


Still, this is why football is really difficult to find causal stats. I would think you do have to normalize for situational factors, =- goalline TDs, successful 3rd and 1s, turnovers, going down in bounds to keep a clock running, etc.

I am not a big fan of taking away tail events. You pay talent on the ability to be provide returns above "above the mean/median." You could take away 7 of Dez's Tds and all of a sudden he looks middling. What would be interesting is if there were backs that were bi=modal (maybe sanders in his day). You have a block of runs -3 to 1 yards and then maybe not much between 1-3yards and then anothe bump from 4-7 yards. Bi modal or cauchy distribution may exist and really throws this whole premise off.

This issue is one that always bugs me - it is largely the impact of fantasy football folks who have some basic stats background but fail to understand context.

For example - claims that Murray's performance drops to the league average if you take out his biggest runs is just a clear failure to think things through critically. Gee, doesn't the league average include a ton of outliers as well? Comparing an outlier-removed statistic to one that includes outliers is totally meaningless. Doesn't tell us anything. Had the OP found an outlier-removed league average, then we'd have a comparison that showed that Murray was still pretty darn good.

Another example - wanting to see the impact of outlier removal on standard deviation. Good lord - we already know the answer there. It is going to be smaller. Don't need calculations to prove that, just critical thinking.

Sometimes outliers shouldn't be in an analysis - but in this case you might argue that these outliers represent some of the most important plays of the year. Of course, making YPC even less useful is how it eliminates context (again). For example, gaining 7 yards on a carry might look pretty good. But it sucks if it is 3rd and 8 and the back couldn't get that final yard.
 
I was talking about a WAR like stat were they could assign a dollar value to a player's production.

If Mike Trout of the LA Angels has a 6.0 WAR and each Win is worth 8m in salary, you can say he is worth 48m.

His WAR can be accumulated through Hitting, Defense and Baserunning, just as a RB can accumulate WAR through Rushing, Receiving and TDs.

I've seen Pro Football Reference use a stat called Approximate Value, but I don't know how they come up with it.

The econmetric-based measures are likely the best indications of value - however, football is a lot more complicated than baseball. For the most part in baseball the value comes from moving runners up and getting them to score. Football is trickier as often you have to play for field position. Maybe you get a sack that puts you in 3rd and 20 from your own 30. A successful play there might be one that gets your punter within say 5 yards of midfield. That sort of complexity makes it harder to get a good statistic.
 
I was talking about a WAR like stat were they could assign a dollar value to a player's production.

If Mike Trout of the LA Angels has a 6.0 WAR and each Win is worth 8m in salary, you can say he is worth 48m.

His WAR can be accumulated through Hitting, Defense and Baserunning, just as a RB can accumulate WAR through Rushing, Receiving and TDs.

I've seen Pro Football Reference use a stat called Approximate Value, but I don't know how they come up with it.

baseball lends itself to stats infinitely more than football. Minimal subjectivity, huge sample size in a season for multiple scenario slices, etc. I remember reviewing the KC on here game from two seasons ago. People thought Murray left too many yards on the field...the reality was that there were some massve gaps in the blocking (Hanna, Berny and even Fredd vs Poe). There were a couple of 0yard runs that were herculean from murray. A double into the gap is a double into the gap, no matter the defensive shift
 
This issue is one that always bugs me - it is largely the impact of fantasy football folks who have some basic stats background but fail to understand context.

For example - claims that Murray's performance drops to the league average if you take out his biggest runs is just a clear failure to think things through critically. Gee, doesn't the league average include a ton of outliers as well? Comparing an outlier-removed statistic to one that includes outliers is totally meaningless. Doesn't tell us anything. Had the OP found an outlier-removed league average, then we'd have a comparison that showed that Murray was still pretty darn good.

Another example - wanting to see the impact of outlier removal on standard deviation. Good lord - we already know the answer there. It is going to be smaller. Don't need calculations to prove that, just critical thinking.

Sometimes outliers shouldn't be in an analysis - but in this case you might argue that these outliers represent some of the most important plays of the year. Of course, making YPC even less useful is how it eliminates context (again). For example, gaining 7 yards on a carry might look pretty good. But it sucks if it is 3rd and 8 and the back couldn't get that final yard.

But even the initial premise was flawed because why stop at his top 7 carries. Those weren't outliers per se. He had 15 carries over 20 yards and another dozen over 15 yds.

If you remove all 15 yd carries from everyone it will change everything, but Murray had a lot more 15 yd carries than most other RBs so he would be hurt a lot more.
 
This issue is one that always bugs me - it is largely the impact of fantasy football folks who have some basic stats background but fail to understand context.

For example - claims that Murray's performance drops to the league average if you take out his biggest runs is just a clear failure to think things through critically. Gee, doesn't the league average include a ton of outliers as well? Comparing an outlier-removed statistic to one that includes outliers is totally meaningless. Doesn't tell us anything. Had the OP found an outlier-removed league average, then we'd have a comparison that showed that Murray was still pretty darn good.

Another example - wanting to see the impact of outlier removal on standard deviation. Good lord - we already know the answer there. It is going to be smaller. Don't need calculations to prove that, just critical thinking.

Sometimes outliers shouldn't be in an analysis - but in this case you might argue that these outliers represent some of the most important plays of the year. Of course, making YPC even less useful is how it eliminates context (again). For example, gaining 7 yards on a carry might look pretty good. But it sucks if it is 3rd and 8 and the back couldn't get that final yard.

The idea isn't that we should be discussing taking out Murray's exceptional runs and then compare Murray's adjusted averages to the rest of the league. The idea is that all backs actually have a small number of exceptional runs which are outliers to the expected performance for any given running play and that maybe we should be focussed more about the expected outcomes of those plays than we are about the outliers.

If you lose a back like Demarco Murray, but 99% of your carries still fall in that relatively tall area under the curve, you're still getting your team in the situations they need to be in to pass the ball successfully. From there, it's a matter of protecting Romo and making the plays. Almost every RB thread we've had this year has lamented the fact that we're going to be throwing against fewer men in the box and we're going to be having a harder time getting in position to move the chains. But, if it's the case that most RBs perform similarly most of the time and the exceptional ones outperform the herd by a significant degree but only on a very limited number of snaps over the course of the season, then really maybe all you're losing are those few exceptional snaps.

And those matter. By the same token, I think we also can all see that it might be easier to make some of them up via the passing game without necessarily ratcheting up the high probability plays. Whether it's by some big plays in the screen game to McFadden or Dunbar, or more play action plays on first downs, or more passes in the direction of Beasley or Dez, or by getting an even higher percentage of outlier runs from Randle or Williams or McFadden. There are options for replacing those 7-8 big Murray runs. And the rest of the time we're really not dealing with any significant drop off in terms of down and distance.
 
The idea isn't that we should be discussing taking out Murray's exceptional runs and then compare Murray's adjusted averages to the rest of the league. The idea is that all backs actually have a small number of exceptional runs which are outliers to the expected performance for any given running play and that maybe we should be focussed more about the expected outcomes of those plays than we are about the outliers.

If you lose a back like Demarco Murray, but 99% of your carries still fall in that relatively tall area under the curve, you're still getting your team in the situations they need to be in to pass the ball successfully. From there, it's a matter of protecting Romo and making the plays. Almost every RB thread we've had this year has lamented the fact that we're going to be throwing against fewer men in the box and we're going to be having a harder time getting in position to move the chains. But, if it's the case that most RBs perform similarly most of the time and the exceptional ones outperform the herd by a significant degree but only on a very limited number of snaps over the course of the season, then really maybe all you're losing are those few exceptional snaps.

And those matter. By the same token, I think we also can all see that it might be easier to make some of them up via the passing game without necessarily ratcheting up the high probability plays. Whether it's by some big plays in the screen game to McFadden or Dunbar, or more play action plays on first downs, or more passes in the direction of Beasley or Dez, or by getting an even higher percentage of outlier runs from Randle or Williams or McFadden. There are options for replacing those 7-8 big Murray runs. And the rest of the time we're really not dealing with any significant drop off in terms of down and distance.

Totally incorrect. If you took out every league player's big runs, Murray - even without those big 7-8 plays is still at 4.1 YPC. Realistically that should be compared against 3.5 or so as that's likely where the league average is going to be if you correctly compare apples to apples.

Do you honestly think this is about 7-8 "big" plays? That is totally ignoring how Murray converted a huge % of his short yardage to go situations into first downs . It is ignoring that you aren't just replacing big plays but you have to replace nearly 400 carries. It is ignoring how you have to replace 100s of hard runs that take a toll on the defense.
 
Totally incorrect. If you took out every league player's big runs, Murray - even without those big 7-8 plays is still at 4.1 YPC. Realistically that should be compared against 3.5 or so as that's likely where the league average is going to be if you correctly compare apples to apples.

Totally incorrect because you're adjusting to a 3.5 ypc or so that you've just made up on the spot? I don't think so.

The whole point of the thread is that, for all backs, the big plays are outliers and the outliers are significant in terms of a RBs average but that they don't reflect the vast majority of what a RB does for an NFL team. It's not ignoring the nearly 400 hundred carries or the 100s of hard runs at all. In fact, it's the opposite. It's saying that those are the types of runs that are reasonable outcomes for most average blocked running plays for average NFL backs, and, as a percentage of rushing plays that are called, they account for the overwhelming majority of a given RBs production. Which, in turn, is possibly why a the relative effectiveness of a RB's play does not materially impact winning percentages. Because 99% of the time or more he's getting the kind of success these plays generate, and drives are kept alive based on that kind of production. The outlier plays are the rare exception to the rule.
 
Totally incorrect because you're adjusting to a 3.5 ypc or so that you've just made up on the spot? I don't think so.

The whole point of the thread is that, for all backs, the big plays are outliers and the outliers are significant in terms of a RBs average but that they don't reflect the vast majority of what a RB does for an NFL team. It's not ignoring the nearly 400 hundred carries or the 100s of hard runs at all. In fact, it's the opposite. It's saying that those are the types of runs that are reasonable outcomes for most average blocked running plays for average NFL backs, and, as a percentage of rushing plays that are called, they account for the overwhelming majority of a given RBs production. Which, in turn, is possibly why a the relative effectiveness of a RB's play does not materially impact winning percentages. Because 99% of the time or more he's getting the kind of success these plays generate, and drives are kept alive based on that kind of production. The outlier plays are the rare exception to the rule.

Come up with a realistic number. My adjusted value is far closer to reality than your apples to oranges comparison.

I've clearly pointed out a massive flaw in your logic. Be a man and admit it.
 
Come up with a realistic number. My adjusted value is far closer to reality than your apples to oranges comparison.

I've clearly pointed out a massive flaw in your logic. Be a man and admit it.

Hey, now. First of all, you haven't pointed out anything. You made an obvious comment that was beside the point of the argument in the first place, which I tried to address again for you. I say 'again,' because that same point had been made many times previously in this very thread. Second of all, that's sexist. Women can admit they're wrong, too, you know.

Lololol!!!! Just kidding!!!!! (and, actually ladies, I am kidding. I firmly believe that it's only my wife who's never yet been wrong about anything, and I know *that's* right because she's told me so and she's never wrong).

As for me 'coming up with a realistic number for you that supports the point you didn't make,' no. It's not my job to make your arguments. My argument never included the idea that Murray was the only RB with statistical outliers in his YPC data. Why would I try to say that? It doesn't make any sense. The point was--if it's only a relatively small percentage of runs for any running back which are statistically exceptional in the first place--and if equivalent plays are easier to replicate in the passing game anyway--then how much does it really matter who you've got running the ball? We've all seen variants of this argument presented many times, and it's always rejected out of hand by people who want to believe the rushing game affects wins and losses more than it actually does, but seeing it in the context of how a very good RB like Murray's productive carries are actually distributed, I thought, was pretty interesting.
 
Hey, now. First of all, you haven't pointed out anything. You made an obvious comment that was beside the point of the argument in the first place, which I tried to address again for you. I say 'again,' because that same point had been made many times previously in this very thread. Second of all, that's sexist. Women can admit they're wrong, too, you know.

Lololol!!!! Just kidding!!!!! (and, actually ladies, I am kidding. I firmly believe that it's only my wife who's never yet been wrong about anything, and I know *that's* right because she's told me so and she's never wrong).

As for me 'coming up with a realistic number for you that supports the point you didn't make,' no. It's not my job to make your arguments. My argument never included the idea that Murray was the only RB with statistical outliers in his YPC data. Why would I try to say that? It doesn't make any sense. The point was--if it's only a relatively small percentage of runs for any running back which are statistically exceptional in the first place--and if equivalent plays are easier to replicate in the passing game anyway--then how much does it really matter who you've got running the ball? We've all seen variants of this argument presented many times, and it's always rejected out of hand by people who want to believe the rushing game affects wins and losses more than it actually does, but seeing it in the context of how a very good RB like Murray's productive carries are actually distributed, I thought, was pretty interesting.

So now your argument is it is only a small % of runs that have to be replaced. Inherent in this argument is the assumption that Murray does not produce outliers at a rate substantially different from other backs - b/c if he does it kills your replaceability argument. On that basis, you should fully accept my estimate of outlier adjusted YPC.

More generally the premise that we only need to replace 8 big plays remains a complete whiff
 

Staff online

Forum statistics

Threads
464,668
Messages
13,825,322
Members
23,781
Latest member
Vloh10
Back
Top