A Method for Determining the Probability that a Given Team
was the True Best Team in Some Particular Year
by Jesse Frey
Introduction:
As baseball fans, we typically think of the World Series winner as the
champion for a given season. Most of us recognize, however, that luck is
an important factor in deciding the outcome of a best-of-five or
best-of-seven series between two teams. As a result, we don't necessarily
believe that the World Series champion is really the best team in a given
season.
Given the much larger number of games involved, the regular season would
seem to offer a stronger basis for deciding how teams compare. Most of us
would likely have little doubt, for example, that a 90-win team is better
than a 60-win team. When the differences in numbers of victories are
smaller, however, and when the teams in question do not seem to have
played equally difficult schedules, we would likely have less confidence in
saying that the team with more wins is truly the better team. It would be
nice to have a way to assess, probabilistically, the evidence that one
team or the other is stronger.
A natural way to obtain such an assessment is via a Bayesian model. The
two components of such a model are a prior distribution and a likelihood.
The prior distribution, which might be called our best guess about team
quality before seeing any games, should reflect the typical distribution
of quality for baseball teams. That is, it should reflect the fact that
the vast majority of teams tend to have winning percentages between 0.400
and 0.600, with only the rarest of teams having winning percentages below
0.300 or above 0.700. The likelihood, meanwhile, gives the probability,
in terms of the strengths of the two teams, that one team defeats another.
In the Bayesian approach, we combine the prior distribution and the
likelihood to find the posterior distribution, which is our best guess
about team quality given both the prior distribution and the game results
we observed.
In this article we describe a simple Bayesian model for the outcomes of major
league baseball games. We then implement this model to produce both
rankings of the teams in the seasons 1990-2002 and estimates of the
probability that each team is the true best team. In-season estimates of
the strength of each team and the probability that each team is the true
best team are also given. These in-season estimates show nicely how the
Bayesian method incorporates, over time, the evidence provided by game
results.
There is nothing novel about the team rankings we obtain, which tend to
mirror a ranking by winning percentage, or the posterior estimates of the
strength of each team, which can be approximated very well using a simple
ratio described in the next section. What may be new, however, is the
use of the entire joint posterior distribution of the strengths of all the
teams to estimate, for each team, the probability that the team is
the true best team.
The Model:
(This section is surely inadequate to explain MCMC-based inference to
anyone unfamiliar with it. It should be sufficient, however, to allow
those familiar with MCMC to follow what I did.)
Consider the case of 30 teams. Following essentially the Bradley-Terry
model, we assume that each team has associated with it a positive merit
parameter m, where the sum of all the merit parameters is 30. We also
assume that there is a single positive homefield advantage parameter h.
Given teams A and B with merit parameters m(A) and m(B), we take the
probability of team A defeating team B at team B's home park to be
m(A)/(m(A)+h*m(B)), while team B's probability of winning the same game is
h*m(B)/(m(A)+h*m(B)).
For our prior distribution on the merit parameters, we use a rescaled
Dirichlet distribution in which each of the 30 parameters is the same.
Specifically, we assume that the probability that the merit parameters
take on the positive values m(1), m(2),. . ., m(30), summing to 30, is
proportional to (m(1)*m(2)*...*m(30))^10. The prior distribution on h,
which is taken to be independent of the prior distribution on the merit
parameters, is chosen to be normal with mean 1.2 and variance 0.2. The
decision to make each of the parameters for the Dirichlet distribution 10
has the effect of (1) giving all teams the same marginal prior distribution
and (2) making that marginal distribution correspond roughly to the true
distribution of team merit that we have seen in the past 10 years or so.
The impact of this prior distribution on the posterior mean is typically
comparable to the impact of adding 70 games of 0.500 ball to each team's
record. A rough estimate of any team's posterior mean can thus be
obtained as (W+35)/(L+35), where W and L are the team's counts of wins and
losses.
The posterior distribution for the merit parameters and the homefield
advantage parameter given the game results does not fit into any common
parametric class. As a result, Markov Chain Monte Carlo was needed to
produce the estimates tabled in the next section. The updating procedure
used for the merit parameters was parameter-by-parameter Metropolis-Hastings
with a uniform proposal distribution centered at the current value and
concurrent renormalization of the other parameters to keep the sum at 30.
The homefield advantage parameter was also updated via Metropolis-Hastings
with a uniform proposal distribution. Each table given in the next section
corresponds to a run time of 100,000 iterates after a burn-in period of 1,000
iterates. You may assess the adequacy of that number of iterates by (1)
looking at the pre-season table given for each season and (2) looking at
the last two tables given for 1994. In the preseason tables, the true values
for all estimates are the same for all teams. The last two tables given for
1994, meanwhile, are independent estimates of the same underlying true values
since they are based (due to the strike) on exactly the same set of game
results.
The Results:
For each season 1990 to 2002, 7 tables were produced. These tables
give estimates as of (1) March 15, (2) April 30, (3) May 31, (4) June 30,
(5) July 31, (6) August 31, and (7) October 30. The table for March 15
simply shows the prior distribution and gives a sense of the level of
uncertainty in the estimates for the other tables. The numbers given in
each table are as follows:
(1) The team and its rank by posterior mean (Team).
(2) The posterior mean for the team's merit parameter (Mean).
(3) The posterior standard deviation for the merit parameter (S.D.).
(4) The team's wins and losses through the given date (W and L).
(5) The posterior probability that the team is the best, second-best,
third-best, fourth-best, and fifth-best team in baseball (P(1)-P(5)).
At the bottom of each table, the posterior mean and posterior variance for
the homefield advantage parameter are given.
The estimates for the posterior means and standard deviations are simply
the sample means and sample standard deviations over the 100,000 iterates
of the algorithm described in the last section. The estimates for the
posterior probabilities that each team has rank 1 through 5, meanwhile,
are the proportions of the 100,000 iterates for which the team's merit
parameter had the rank in question.
To save space, each season's tables are given in a separate link.
1990 1991
1992 1993
1994 1995
1996 1997
1998 1999
2000 2001
2002
The teams most likely to have been the true best:
The results given in the last section suggest that there are typically 2
to 4 teams with at least a 10% chance of being the true best team. There
may be as many as 12 to 14 teams with at least a 1% chance of being the
best team.
Here are the teams, by year, with at least 10% chances of being the true
best team. The World Series winner is also included. The ordering tends
to follow very closely the order of win totals, though there is some
variation in recent years as a result of unbalanced schedules.
1990 Oakland 54.3%
1990 Pittsburgh 14.1%
1990 Chicago(AL) 11.1%
1990 Cincinnati 5.2% (WS champ)
1991 Pittsburgh 29.3%
1991 Minnesota 20.8% (WS champ)
1991 Atlanta 13.9%
1991 Los Angeles 11.6%
1992 Atlanta 23.1%
1992 Oakland 18.0%
1992 Toronto 17.5% (WS champ)
1992 Pittsburgh 16.9%
1993 Atlanta 34.1%
1993 San Francisco 29.7%
1993 Toronto 9.5% (WS champ)
1994 Montreal 36.2%
1994 New York(AL) 17.7%
1994 Atlanta 10.3%
1995 Cleveland 67.6%
1995 Atlanta 17.7% (WS champ)
1996 Cleveland 36.5%
1996 Atlanta 20.0%
1996 New York(AL) 8.7% (WS champ)
1997 Atlanta 45.2%
1997 Baltimore 18.2%
1997 New York(AL) 12.1%
1997 Florida 8.6% (WS champ)
1998 New York(AL) 68.0% (WS champ)
1998 Atlanta 18.6%
1999 Atlanta 32.8%
1999 Arizona 19.6%
1999 New York(AL) 8.6% (WS champ)
2000 San Francisco 19.7%
2000 Chicago(AL) 17.2%
2000 Oakland 10.4%
2000 St. Louis 10.2%
2000 New York(AL) 3.5% (WS champ)
2001 Seattle 86.9%
2001 Arizona 0.8% (WS champ)
2002 Oakland 27.0%
2002 Atlanta 19.3%
2002 New York(AL) 17.6%
2002 Anaheim 10.6% (WS champ)
The lowest percentages for World Series winners are 0.8% for Arizona in
2001 and 3.5% for New York(AL) in 2000.
Acknowledgement
The game result data used here was obtained free of charge from and is
copyrighted by Retrosheet. Interested parties may contact Retrosheet at
20 Sunset Rd., Newark, DE 19711.