There is a debate that has now swept much of the sabermetric community. It spawned largely out of the American League MVP debate and more specifically, out of a set of comments made by one of the founding fathers of the community, Bill James and his thoughts on the worthiness of the two players. Bill James’ thoughts were as follows:
“We come, then, to the present moment, at which some of my friends and colleagues wish to argue that Aaron Judge is basically even with Jose Altuve, and might reasonably have been the Most Valuable Player. It’s nonsense. Aaron Judge was nowhere near as valuable as Jose Altuve. Why? Because he didn’t do nearly as much to win games for his team as Altuve did. It is NOT close. The belief that it is close is fueled by bad statistical analysis—not as bad as the 1974 statistical analysis, I grant, but flawed nonetheless. It is based essentially on a misleading statistic, which is WAR. Baseball-Reference WAR shows [Altuve] at 8.3, and [Judge] at 8.1. But in reality, they are nowhere near that close. I am not saying that WAR is a bad statistic or a useless statistic, but it is not a perfect statistic, and in this particular case it is just dead wrong. It is dead wrong because the creators of that statistic have severed the connection between performance statistics and wins, thus undermining their analysis.”
What Bill James is effectively trying to argue here is that, in 2017, Jose Altuve was a significantly more valuable player to his team, than Aaron Judge was for the Yankees because of the timing of the two players contributions. Put simply, James argues that Jose Altuve far outperformed his counterpart in high leverage situations (those that have the most impact on the outcome of the game) and therefore was the more valuable player in 2017.
This is a reasonable enough argument. If one intends to be backward-looking, as is the case in awards voting, then it could indeed be argued that the context in which the player offered his contributions is an important part of the calculation. However, if one intends to be forward-looking and attempts to answer the question ‘who is/will be the better player?’ then context simply has no place in the discussion.
The reason for this is simple and has long been known to analysts: accounting for context rewards a player for luck, sequencing and the quality of his teammates – all things quite divorced from a player’s actual talent level.
There is not a lot that I can add to this discussion. It has one that has been going on under the surface for some time and now finds itself being tackled by some of the games’ great thinkers James, Tom Tango, Dave Cameron and Mike Petriello to name a few. Rather what I hope to do here is to articulate the problem in a way that is accessible to all of us.
To do this let’s work through a simple example. For the purposes of this discussion let:
Player A: 40 At Bats, 20 Hits, all singles with a runner on 3rd base and 2 out. All other At Bats resulted in a strike out with no runners on base and none out
Player B: 40 At Bats, 20 Hits, all singles with no runners on and none out. All other At Bats resulted in strike outs with runners on 3rd base and 2 out.
Given only this information, someone arguing a context-dependent position would argue that Player A was the far superior player. Player A was able to get his hits in the highest leverage situations and led to his team scoring at least 20 runs which directly contributes to his team winning games – the ultimate goal in the sport. Player B on the other hand got all of his hits with no runners on and failed to contribute to his team winning in any meaningful way based on the information here (it is possible that the following hitter hit a home run after each of Player B’s hits giving his performance some value but that information is not provided here). Therefore, Player A is the MVP in this discussion.
Again, given only this information, someone from a context-neutral position, would argue that both Player A and Player B performed equally as well. Both players tallied exactly 20 singles and 20 strike outs and therefore, based on this sample, are equally talented players.
But of course this is a very limited set of information – there are other things that should be considered. Let’s say that perhaps Player A hits in the 4th spot in his lineup every game while Player B hits in the lead off spot. How would this change things?
Well, to begin with Player A would see significantly more plate appearances with runners in scoring position.
or perhaps you’re a visual person:
As you can see, the position that you hit within an order has a significant effect on the number of times a hitter will come to the plate with a runner in scoring position. If, across the league hitters in the 4th spot of a lineup receive an extra 1400 plate appearances with runners in scoring position than those leading off, then it is hardly fair to penalize Player B for producing less actual runs for his team.
Now of course this can be adjusted for, we could simply turn this into a rate stat and measure the players’ effectiveness in high leverage situations per opportunity and this effect would largely be alleviated but it is a nice introduction to some of the challenges context-dependent metrics face. For from here, you can start to discuss problems such as the quality of ones teammates – should a player on a great team who has many chances to drive in runs be rewarded compared to a player on a poor team who is not afforded those same opportunities.
Stay With Us A while:
- The 7th Visit Ep. 3 | Early Season Signal
- Shohei Ohtani’s Slider is Actually a Curveball
- The 7th Visit Ep. 2 | Visualising and Deceiving
- Perhaps Andrew McCutchen Should Join the Resistance
- Winding Down | Pitchers Who Could Benefit From a Simplified Approach
There is also the problem of quality of opposition. To begin the game, our lead off hitter always comes to the plate with the bases empty while his opportunities with runners in scoring position do not come until later in the game, by which stage it is likely he is facing a bunch of high leverage bullpen arms. The lead off hitter’s at bat with runners in scoring position against Aroldis Chapman is not equal to the 4th hitters at bat against some soft tossing 4th starter in the first inning. If a player is consistently having his run scoring opportunities come against more difficult opponents then there is a structural bias here.
And then there is the question of luck. If we begin to think of ‘clutchness’ as a player skill we begin to run into trouble. While we all grow up thinking that some people are better at performing under pressure (after all in our own experiences this feels true) there is no such quantifiable trait, with performance in high leverage situations being among the most volatile statistics collected in baseball. It is just as, if not more likely, that a player’s performance in high leverage situations was due to random chance rather than their own inherent abilities. Rewarding someone for being ‘lucky’ feels wrong.
Now with little of your time to spare let me make one last point – I don’t think that these two camps are as far apart as they may claim to be. Take this quote from the same Bill James article quoted above:
“Let us assume for the sake of argument that this “run-based deviation” results primarily or entirely from luck, and in particular let’s look at the comparison between Eric Hosmer and Aaron Judge. Judge created more runs than Hosmer, with fewer outs, but Hosmer had more win impact because his team was more win-efficient based on the runs that they scored and allowed. Let’s assume that is just luck. Would you rather have Aaron Judge next year, or Eric Hosmer?
You would rather have Aaron Judge, obviously—and in fact I would; I would rather have Aaron Judge next year than Eric Hosmer. It is perfectly reasonable to create estimates of projected value in future seasons which are based on the usual and normal relationship of runs to wins.”
For those of us who are less statistically savvy, just replace ‘run-based deviation’ with ‘clutchness’ and you get the picture. What James is trying to say here, I believe, is that indeed his argument towards context-dependent statistics is intended only to be backward looking, that is, to answer the question ‘who was more valuable in period x?’. If we change the question to, ‘who is/will be the better player?’ then James concedes that context-neutral statistics are a ‘perfectly reasonable’ solution.
Compare this with sentiments expressed by Dave Cameron (who perhaps best articulates the position against James):
“So why don’t we build WAR off of one of these numbers instead of a linear-weights-based method? Would WAR be better if it included the context of the events, rather than just an estimate of the player’s contribution to the result based on historical averages?
I think the answer is that it depends on how you’re using WAR. In the case of MVP voting, I do think there is a case to be made for looking at the circumstances under which a player performed, and I did use context-dependent metrics when I was an MVP voter. WAR is an imperfect tool, and it’s particularly imperfect for things like the MVP award, which is why even those of us who host sites that promote WAR fairly extensively suggest not relying solely on its results when filling out a ballot.”
It appears, and with out trying to force words into anyone’s mouth here, that both of these arguments seem to be getting at a similar point. If you strip away the combative prose of James, then perhaps these two sides are actually arguing the same point – though naturally with some departures.
So let me conclude with this, both WAR and any context-dependent version of it that may be developed in the near future are imperfect estimates of player value. It appears that most people educated on this issue agree that when looking back to determine who was the most valuable player to his team over a given period, context is important. Though when asking who was the best player in his time or who is/will be the best/most valuable player context needs to be stripped out of the equation.
Though in any case and I finish with this, when dealing with imperfect estimates (or even perfect ones for that matter), it is not only the estimate that we must concern ourselves with but rather our proper application of it to the problem at hand. The human link between question and answer will, at least for some time, be the most important part of any statistical analysis.
Be Sure to Follow us on: