“WAR! Huh! Good God. What is it good for?
“Absolutely nothing” according to the song by Edwin Starr.
Many major league baseball analysts disagree.
WAR (Wins Above Replacement) has become one of the most oft-cited statistics among those who take an analytics or sabermetric approach to baseball. The 2015 baseball season begins next week so it’s a good time to take a look at WAR. What is it and what is it good for?
WAR grew from the desire to have a single statistic that would allow players to be compared to each other in terms of their value to their team. The basic question WAR tries to answer is how many more wins could a team expect to get over the course of a season if their best player at a position plays every game as opposed to a replacement player playing every game. The difference between the number of wins expected with the best player playing and the number of wins expected with a replacement player playing is the best player’s WAR score. The higher the WAR score, the better the player is thought to be.
The first question that must be asked is who is the replacement player? In the real world replacing a player means replacing him with someone who is actually available to the team. This makes a player’s WAR partially dependent on how good his backup is in addition to how good he is. While this might provide a reasonable estimate of the player’s actual value to a team, it doesn’t provide a way to evaluate a player in general terms so that different players can be compared.
WAR addresses this problem by defining the replacement as a player who has something like the minimum level of skill needed to play major league baseball. Fangraphs (one of the two organizations that computes WAR – more on this later) defines the replacement player as “a freely available player such as a minor league free agent or very poor MLB bench player.” In other words, WAR compares players to a replacement no one would use if they had any other option.
Baseball is a complex game. Position players must master a variety of both offensive and defensive skills. The defensive skills that are needed differ with position; infielders and outfielders need very different skill sets, for example. Pitchers must command a unique set of skills that are not shared by any other player on the team. Different ballparks have different characteristics that can have large and important effects on the statistics that measure performance for both pitchers and position players. One of the strengths of WAR is that it can take all of these different factors into account.
How does WAR do this? WAR is derived from a large set of statistics that measure or attempt to measure a wide variety of offensive and defensive skills. Because the skills demanded of pitchers and position players rarely overlap, WAR for pitchers and position players is derived from different sets of statistics. The same is true for different types of position players. For example, WAR for catchers makes use of a statistic that measures how successful the catcher is at throwing out base runners that try to steal, a statistic that does not contribute to WAR for other position players. The great strength of WAR is that it is able to estimate overall value for players that play different positions in a way that allows players from different teams, leagues, and eras to be compared. It can do this because of the wide variety of statistics that go into deriving WAR, the different sets of statistics that are used to derive WAR for different positions, and WAR’s use of the abstractly defined replacement player as a common basis for comparison.
Many of the statistics that are used to calculate WAR are newer statistical measures that have been developed with the rise of the sabermetric or analytic approach to baseball. These statistical tools have been gaining widespread acceptance among fans and analysts and are often used by professional baseball teams when making player personnel decisions.
While these tools can be very useful and can provide insights into the game that are unavailable without them, they are not all powerful. Every statistical measure rests on a unique set of assumptions that must be met for the statistic to provide a valid measure of whatever it is that the statistic is supposed to be measuring. The need for assumptions to be met is a source of difficulty for WAR.
When assumptions are not met for any individual statistic, the error in that measure increases. Errors in the different statistics that go into computing WAR don’t cancel each other out. Instead, they combine to produce a larger error in WAR than will be found in any of the individual statistics that go into computing WAR. This is especially a problem for defensive statistics because they have been developed more recently and are usually not as refined as the statistics that are used to measure offensive performance.
The situation is further complicated by the fact that there isn’t an officially accepted definition of WAR. WAR is calculated by two organizations, Fangraphs and Baseball Reference, and they each derive it from a different formula.
When many commentators or analysts talk about WAR they often don’t tell you which version of WAR they’re talking about and they don’t consider whether the two versions give significantly different results for the player they are talking about.
WAR has aroused controversy in baseball with some claiming it is the best statistic ever created for measuring a player’s value while others claim it’s absolutely worthless. As is usually the case, the truth lies somewhere between these extreme positions.
An estimate is not as precise as a measure and Fangraphs provides good advice when they point out that WAR is an estimate of worth, not a measure of worth. How precise is the estimate? Fangraphs again gives us a useful guide when they advise considering a +1 and -1 range around a WAR score as an estimate of a player’s worth. For example, the value in terms of wins above replacement for a player with a WAR of 6.8 is likely to fall somewhere in the range between 5.8 and 7.8.
It’s useful to remember that this is a sliding range. This means that a player with a 6.8 WAR (5.8 to 7.8 range) could have roughly the same value as a player with a 4.9 WAR (3.9 to 5.9 range) because their WAR ranges overlap. The 6.8 player could also have substantially more value than the 4.9 player. WAR indicates that the 6.8 player is likely to be more valuable to his team than the 4.9 player but the uncertainty that is built into the statistic means that this conclusion about the relative value of the two players is uncertain as well. As long as you can live with this uncertainty, and you remember to translate WAR scores into +1/-1 ranges, WAR can provide a limited but useful estimate of a player’s value to his team.
(Full disclosure: I’m a Washington Nationals fan. Could you tell?)