Improving the Baseball-Reference.com stat converter

I’ve seen mentions over the past few years from major outlets like MLB Network of one of my favorite tools on Baseball-Reference.com: the stat converter.

To the uninitiated, the stat converter can be found by clicking More Stats on a player page and scrolling down to where to it says Neutralized Batting or Pitching. With this tool, we can discover things, such as that Ty Cobb would have had 101 steals if he’d played his 1915 season for the ’62 Dodgers, meaning Maury Wills still would have broken his single-season record. And Pedro Martinez might have bested Bob Gibson’s 1.12 ERA if he’d played his 2000 season for the ’68 Cardinals; the converter has Martinez at a 1.00 ERA. It’s enough to keep a stat geek like me occupied for hours. It has, I think.

Near as I can tell, the stat converter works on a simple algorithm. Essentially, every player in baseball history has been on a team that scored an average number of runs per game and has played in a ballpark with numerical factors based around 100 that denote how conducive it’s been to pitching and hitting. The stat converter seems to tweak players’ numbers by taking the average runs and ballpark factors from their original teams and substituting those values out for whatever team they’re placed on, with the overall average runs for their leagues also being taken into account. It’s a fun, quick way to make projections across eras, but it isn’t perfect.

I first noticed an issue a few years ago when trying to project Live Ball Era numbers for Deadball great Home Run Baker. Wanting to see if Baker would have been worthy of his nickname on the 1999 Colorado Rockies, who had four players with 30 homers, I ran his converted numbers and saw he jumped from 12 home runs with the 1913 Philadelphia A’s to 18. This didn’t seem right. The converter also projects Baker having over 200 RBIs and a .400 batting average, also seemingly unrealistic. His projected strikeout rate doesn’t wash, either. In 1913 where AL players struck out once every 9.7 plate appearances, Baker whiffed once every 20.7 plate appearances. In 1999 where NL players struck out once every 5.9 PAs, Baker is projected to K once every 23.4 PAs.

The problem with the converter is apparent as well going from Live Ball to Deadball Era. Barry Bonds of 2001 is projected to have hit 57 home runs on the 1916 Boston Braves. Dave Robertson, Cy Williams and Wally Pipp led baseball with 12 homers apiece that season. And Bonds’ projected ballpark, Braves Field boasted vast dimensions such as nearly 500 feet to dead center to facilitate inside-the-parks homers, Boston owner James Gaffney’s favorite blend of baseball. I don’t care how much better Bonds was than the rest of the majors in the early 2000s. He would’ve been pushed to hit even 25 homers for the 1916 Braves, a team that hit 22 collectively and had no player with more than four homers.

The issue doesn’t just lie with Live Ball to Deadball conversions. I couldn’t find a modern pitcher projected to win 30 games on the 1968 Tigers, not Randy Johnson in 2001, Bob Welch in 1990, or Ron Guidry in 1978, to name three recent pitchers who’ve come close since Denny McLain won 31 games for Detroit. Part of the problem is that the stat converter doesn’t appear to adjust for different pitcher usage rates between eras. Welch, for instance, was fourth-best in baseball with 238 innings in 1990. The fourth most-durable pitcher in 1968, Bob Gibson had 304.2 innings, though the stat converter offers Welch’s projections with 238 innings that year. It’s part of the reason he’s projected to go just 14-12 for the Tigers.

All of this, so we’re clear, isn’t to take major issue with the stat converter. It’s one of many, many reasons that Baseball-Reference.com is easily my favorite baseball website of them all, one more reason I should probably be paying Sean Forman rent for the amount of time I spend on his site. I don’t know how or if it’s feasible to adjust the stat converter so that it becomes anything more than a fun, simple tool that projects based on run-related averages. But I offer all of this with the hope of advancing creative endeavor.

Leave a Reply

Your email address will not be published. Required fields are marked *