A Method for Determining the Probability that a Given Team was the True Best Team in Some Particular Year by Jesse Frey

Introduction:

As baseball fans, we typically think of the World Series winner as the champion for a given season. Most of us recognize, however, that luck is an important factor in deciding the outcome of a best-of-five or best-of-seven series between two teams. As a result, we don't necessarily believe that the World Series champion is really the best team in a given season. Given the much larger number of games involved, the regular season would seem to offer a stronger basis for deciding how teams compare. Most of us would likely have little doubt, for example, that a 90-win team is better than a 60-win team. When the differences in numbers of victories are smaller, however, and when the teams in question do not seem to have played equally difficult schedules, we would likely have less confidence in saying that the team with more wins is truly the better team. It would be nice to have a way to assess, probabilistically, the evidence that one team or the other is stronger. A natural way to obtain such an assessment is via a Bayesian model. The two components of such a model are a prior distribution and a likelihood. The prior distribution, which might be called our best guess about team quality before seeing any games, should reflect the typical distribution of quality for baseball teams. That is, it should reflect the fact that the vast majority of teams tend to have winning percentages between 0.400 and 0.600, with only the rarest of teams having winning percentages below 0.300 or above 0.700. The likelihood, meanwhile, gives the probability, in terms of the strengths of the two teams, that one team defeats another. In the Bayesian approach, we combine the prior distribution and the likelihood to find the posterior distribution, which is our best guess about team quality given both the prior distribution and the game results we observed. In this article we describe a simple Bayesian model for the outcomes of major league baseball games. We then implement this model to produce both rankings of the teams in the seasons 1990-2002 and estimates of the probability that each team is the true best team. In-season estimates of the strength of each team and the probability that each team is the true best team are also given. These in-season estimates show nicely how the Bayesian method incorporates, over time, the evidence provided by game results. There is nothing novel about the team rankings we obtain, which tend to mirror a ranking by winning percentage, or the posterior estimates of the strength of each team, which can be approximated very well using a simple ratio described in the next section. What may be new, however, is the use of the entire joint posterior distribution of the strengths of all the teams to estimate, for each team, the probability that the team is the true best team.

The Model:

(This section is surely inadequate to explain MCMC-based inference to anyone unfamiliar with it. It should be sufficient, however, to allow those familiar with MCMC to follow what I did.) Consider the case of 30 teams. Following essentially the Bradley-Terry model, we assume that each team has associated with it a positive merit parameter m, where the sum of all the merit parameters is 30. We also assume that there is a single positive homefield advantage parameter h. Given teams A and B with merit parameters m(A) and m(B), we take the probability of team A defeating team B at team B's home park to be m(A)/(m(A)+h*m(B)), while team B's probability of winning the same game is h*m(B)/(m(A)+h*m(B)). For our prior distribution on the merit parameters, we use a rescaled Dirichlet distribution in which each of the 30 parameters is the same. Specifically, we assume that the probability that the merit parameters take on the positive values m(1), m(2),. . ., m(30), summing to 30, is proportional to (m(1)*m(2)*...*m(30))^10. The prior distribution on h, which is taken to be independent of the prior distribution on the merit parameters, is chosen to be normal with mean 1.2 and variance 0.2. The decision to make each of the parameters for the Dirichlet distribution 10 has the effect of (1) giving all teams the same marginal prior distribution and (2) making that marginal distribution correspond roughly to the true distribution of team merit that we have seen in the past 10 years or so. The impact of this prior distribution on the posterior mean is typically comparable to the impact of adding 70 games of 0.500 ball to each team's record. A rough estimate of any team's posterior mean can thus be obtained as (W+35)/(L+35), where W and L are the team's counts of wins and losses. The posterior distribution for the merit parameters and the homefield advantage parameter given the game results does not fit into any common parametric class. As a result, Markov Chain Monte Carlo was needed to produce the estimates tabled in the next section. The updating procedure used for the merit parameters was parameter-by-parameter Metropolis-Hastings with a uniform proposal distribution centered at the current value and concurrent renormalization of the other parameters to keep the sum at 30. The homefield advantage parameter was also updated via Metropolis-Hastings with a uniform proposal distribution. Each table given in the next section corresponds to a run time of 100,000 iterates after a burn-in period of 1,000 iterates. You may assess the adequacy of that number of iterates by (1) looking at the pre-season table given for each season and (2) looking at the last two tables given for 1994. In the preseason tables, the true values for all estimates are the same for all teams. The last two tables given for 1994, meanwhile, are independent estimates of the same underlying true values since they are based (due to the strike) on exactly the same set of game results.

The Results:

For each season 1990 to 2002, 7 tables were produced. These tables give estimates as of (1) March 15, (2) April 30, (3) May 31, (4) June 30, (5) July 31, (6) August 31, and (7) October 30. The table for March 15 simply shows the prior distribution and gives a sense of the level of uncertainty in the estimates for the other tables. The numbers given in each table are as follows: (1) The team and its rank by posterior mean (Team). (2) The posterior mean for the team's merit parameter (Mean). (3) The posterior standard deviation for the merit parameter (S.D.). (4) The team's wins and losses through the given date (W and L). (5) The posterior probability that the team is the best, second-best, third-best, fourth-best, and fifth-best team in baseball (P(1)-P(5)). At the bottom of each table, the posterior mean and posterior variance for the homefield advantage parameter are given. The estimates for the posterior means and standard deviations are simply the sample means and sample standard deviations over the 100,000 iterates of the algorithm described in the last section. The estimates for the posterior probabilities that each team has rank 1 through 5, meanwhile, are the proportions of the 100,000 iterates for which the team's merit parameter had the rank in question. To save space, each season's tables are given in a separate link. 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002

The teams most likely to have been the true best:

The results given in the last section suggest that there are typically 2 to 4 teams with at least a 10% chance of being the true best team. There may be as many as 12 to 14 teams with at least a 1% chance of being the best team. Here are the teams, by year, with at least 10% chances of being the true best team. The World Series winner is also included. The ordering tends to follow very closely the order of win totals, though there is some variation in recent years as a result of unbalanced schedules. 1990 Oakland 54.3% 1990 Pittsburgh 14.1% 1990 Chicago(AL) 11.1% 1990 Cincinnati 5.2% (WS champ) 1991 Pittsburgh 29.3% 1991 Minnesota 20.8% (WS champ) 1991 Atlanta 13.9% 1991 Los Angeles 11.6% 1992 Atlanta 23.1% 1992 Oakland 18.0% 1992 Toronto 17.5% (WS champ) 1992 Pittsburgh 16.9% 1993 Atlanta 34.1% 1993 San Francisco 29.7% 1993 Toronto 9.5% (WS champ) 1994 Montreal 36.2% 1994 New York(AL) 17.7% 1994 Atlanta 10.3% 1995 Cleveland 67.6% 1995 Atlanta 17.7% (WS champ) 1996 Cleveland 36.5% 1996 Atlanta 20.0% 1996 New York(AL) 8.7% (WS champ) 1997 Atlanta 45.2% 1997 Baltimore 18.2% 1997 New York(AL) 12.1% 1997 Florida 8.6% (WS champ) 1998 New York(AL) 68.0% (WS champ) 1998 Atlanta 18.6% 1999 Atlanta 32.8% 1999 Arizona 19.6% 1999 New York(AL) 8.6% (WS champ) 2000 San Francisco 19.7% 2000 Chicago(AL) 17.2% 2000 Oakland 10.4% 2000 St. Louis 10.2% 2000 New York(AL) 3.5% (WS champ) 2001 Seattle 86.9% 2001 Arizona 0.8% (WS champ) 2002 Oakland 27.0% 2002 Atlanta 19.3% 2002 New York(AL) 17.6% 2002 Anaheim 10.6% (WS champ) The lowest percentages for World Series winners are 0.8% for Arizona in 2001 and 3.5% for New York(AL) in 2000.

Acknowledgement

The game result data used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.