Wednesday, February 22, 2012

The New York Yankees Payroll vs Everyone Else (Major League Baseball)


Description:
Major League Baseball payrolls for all teams since 1985. The New York Yankees payroll is highlighted with results defined by the shape of the point.

Data:
http://www.baseball-databank.org/


Analysis:
For years fans of Major League Baseball (MLB) have been crying 'foul!' at the New York Yankees regarding their spending habits. The primary complaint is that the Yankees have 'bought' their championships, leaving the other teams to squander in mediocrity.

What does it means to 'buy' a championship? Is there an implication that the Yankees are the only team to pay players while the rest rely on volunteers? Of course not - all players get paid (handsomely - see previous article on the rising MLB payroll).

Looking at the above graph, it is clear that the Yankees have been a top-tier paying team since 1985. However this has not resulted in a championship every year. In fact, during their world series run in the late 90s, their payroll was not always the highest.

It wasn't until after 2000 that their payroll began to quickly outpace everyone else, resulting in a high water mark in 2005 (where they failed to make it past the first round of playoffs). Since then, a slow downward trend has occurred in their payroll - perhaps they have realized that throwing money at players isn't the path to world series rings.

"But," you ask, "does a high payroll equal a better winning percentage?" Analysis of payroll vs winning percentage (not shown) does seem to indicate a relationship, but not a strong one (r = 0.496, r2=0.214). This makes sense because a high salary does not indicate actual skill; rather, it has a lot to do with player contract composition and the subjective opinions of team management.

Questions:
1) Will the Red Sox catch up in terms of payroll?
2) How long can the Yankees sustain their current spending habits?
3) What will the low-payroll teams do in order to close the gap to the high-payroll teams?

Code:
This graph was generated using the 'ggplot2' package within the R programming language:
ggplot(adjusted.salaries.frame, aes(x=yearID, y=payroll)) +
  geom_point() +
  geom_point(aes(x=adjusted.yankees.frame$yearID, y=adjusted.yankees.frame$payroll, 
      color="Yankees", colour=adjusted.yankees.frame$Result, 
      shape=adjusted.yankees.frame$Result), size=5) +
  geom_line(aes(x=adjusted.yankees.frame$yearID, y=adjusted.yankees.frame$payroll, 
      color="Yankees"), size=1.1) +
  
  ylab("Team Payroll (in U.S. Dollars)") +
  xlab("Year") +
  
  opts(title="MLB Payrolls: The New York Yankees vs All Other Teams (adjusted for inflation)",
    legend.title = theme_blank(),
    panel.background = theme_blank()) +
  scale_y_continuous(formatter = mysep)

Further Reading (the following all have at least 3 stars or are unrated):

4 comments:

  1. Thanks Patrick. Your points were well made. If possible, a graph on the following question would be a good follow-up:

    "What is the Yankee's winning percentage in relationship to the age of players on the the team?"

    ReplyDelete
  2. Nicely done (and thanks for the ggplot2 code). You mentioned an r^2=0.214. My estimate came out considerably smaller; probably as a result of an error on my part. I'm a bit new at rooting around the baseball data so it's likely I missed something. REC= W/(W+L) from team table right? (FWIW - My salaries seemed to match as close as could be expected so I must be on the right track.)

    ReplyDelete
    Replies
    1. Hi, Robbie.

      Yep, winning percentage is: wins/(wins+losses). What method did you use to calculate it? And is your range of data the same as mine? At any rate, I double checked and yep, it's 0.214. In R, you can use the 'lm' function to get your r^2 value.

      Delete
  3. Hi, Sgt. Pepper.

    A full analysis of Major League Baseball and age (not to mention weight/height) would really be needed to explore this topic fully. That being said, I put together a couple of charts:

    Yankees Age and Winning Percentage

    There is a wide confidence band for the above graph and the model is not valid for ages past 35. I wouldn't use it to make any predictions.

    Yankees Age and End Result


    This one is more interesting because you can see the range of ages on the team and what their year-end result was. Again, no real correlation that I can see, but it's obvious they were trying to keep the same team together after their world series run (the average age goes up by about one for the next couple of years).

    ReplyDelete

Note: Only a member of this blog may post a comment.

Stylify Your Blog