A Method for Determining the Probability that a Given Team
 was the True Best Team in Some Particular Year

by Jesse Frey



Introduction:


As baseball fans, we typically think of the World Series winner as the 
champion for a given season.  Most of us recognize, however, that luck is 
an important factor in deciding the outcome of a best-of-five or 
best-of-seven series between two teams.  As a result, we don't necessarily 
believe that the World Series champion is really the best team in a given 
season.  


Given the much larger number of games involved, the regular season would 
seem to offer a stronger basis for deciding how teams compare.  Most of us 
would likely have little doubt, for example, that a 90-win team is better 
than a 60-win team.  When the differences in numbers of victories are 
smaller, however, and when the teams in question do not seem to have 
played equally difficult schedules, we would likely have less confidence in 
saying that the team with more wins is truly the better team.  It would be 
nice to have a way to assess, probabilistically, the evidence that one 
team or the other is stronger. 


A natural way to obtain such an assessment is via a Bayesian model.  The 
two components of such a model are a prior distribution and a likelihood.  
The prior distribution, which might be called our best guess about team 
quality before seeing any games, should reflect the typical distribution 
of quality for baseball teams.  That is, it should reflect the fact that 
the vast majority of teams tend to have winning percentages between 0.400 
and 0.600, with only the rarest of teams having winning percentages below 
0.300 or above 0.700.  The likelihood, meanwhile, gives the probability, 
in terms of the strengths of the two teams, that one team defeats another.  
In the Bayesian approach, we combine the prior distribution and the 
likelihood to find the posterior distribution, which is our best guess 
about team quality given both the prior distribution and the game results 
we observed.  


In this article we describe a simple Bayesian model for the outcomes of major 
league baseball games.  We then implement this model to produce both 
rankings of the teams in the seasons 1990-2002 and estimates of the 
probability that each team is the true best team.  In-season estimates of 
the strength of each team and the probability that each team is the true 
best team are also given.  These in-season estimates show nicely how the 
Bayesian method incorporates, over time, the evidence provided by game 
results.  


There is nothing novel about the team rankings we obtain, which tend to 
mirror a ranking by winning percentage, or the posterior estimates of the 
strength of each team, which can be approximated very well using a simple 
ratio described in the next section.  What may be new, however, is the 
use of the entire joint posterior distribution of the strengths of all the 
teams to estimate, for each team, the probability that the team is 
the true best team.


The Model:

(This section is surely inadequate to explain MCMC-based inference to 
anyone unfamiliar with it.  It should be sufficient, however, to allow 
those familiar with MCMC to follow what I did.)


Consider the case of 30 teams.  Following essentially the Bradley-Terry 
model, we assume that each team has associated with it a positive merit 
parameter m, where the sum of all the merit parameters is 30.  We also 
assume that there is a single positive homefield advantage parameter h.  
Given teams A and B with merit parameters m(A) and m(B), we take the 
probability of team A defeating team B at team B's home park to be 
m(A)/(m(A)+h*m(B)), while team B's probability of winning the same game is 
h*m(B)/(m(A)+h*m(B)).   


For our prior distribution on the merit parameters, we use a rescaled 
Dirichlet distribution in which each of the 30 parameters is the same.  
Specifically, we assume that the probability that the merit parameters 
take on the positive values m(1), m(2),. . ., m(30), summing to 30, is 
proportional to (m(1)*m(2)*...*m(30))^10.  The prior distribution on h, 
which is taken to be independent of the prior distribution on the merit 
parameters, is chosen to be normal with mean 1.2 and variance 0.2.  The 
decision to make each of the parameters for the Dirichlet distribution 10 
has the effect of (1) giving all teams the same marginal prior distribution 
and (2) making that marginal distribution correspond roughly to the true 
distribution of team merit that we have seen in the past 10 years or so. 
The impact of this prior distribution on the posterior mean is typically
comparable to the impact of adding 70 games of 0.500 ball to each team's 
record.  A rough estimate of any team's posterior mean can thus be 
obtained as (W+35)/(L+35), where W and L are the team's counts of wins and 
losses.


The posterior distribution for the merit parameters and the homefield 
advantage parameter given the game results does not fit into any common
parametric class.  As a result, Markov Chain Monte Carlo was needed to 
produce the estimates tabled in the next section.  The updating procedure 
used for the merit parameters was parameter-by-parameter Metropolis-Hastings 
with a uniform proposal distribution centered at the current value and 
concurrent renormalization of the other parameters to keep the sum at 30.  
The homefield advantage parameter was also updated via Metropolis-Hastings 
with a uniform proposal distribution.  Each table given in the next section 
corresponds to a run time of 100,000 iterates after a burn-in period of 1,000 
iterates.  You may assess the adequacy of that number of iterates by (1) 
looking at the pre-season table given for each season and (2) looking at 
the last two tables given for 1994.  In the preseason tables, the true values 
for all estimates are the same for all teams.  The last two tables given for 
1994, meanwhile, are independent estimates of the same underlying true values 
since they are based (due to the strike) on exactly the same set of game 
results.  


The Results:


For each season 1990 to 2002, 7 tables were produced.  These tables  
give estimates as of (1) March 15, (2) April 30, (3) May 31, (4) June 30, 
(5) July 31, (6) August 31, and (7) October 30.  The table for March 15 
simply shows the prior distribution and gives a sense of the level of 
uncertainty in the estimates for the other tables.  The numbers given in 
each table are as follows:


(1)  The team and its rank by posterior mean (Team).
(2)  The posterior mean for the team's merit parameter (Mean).
(3)  The posterior standard deviation for the merit parameter (S.D.).
(4)  The team's wins and losses through the given date (W and L).
(5)  The posterior probability that the team is the best, second-best, 
third-best, fourth-best, and fifth-best team in baseball (P(1)-P(5)).


At the bottom of each table, the posterior mean and posterior variance for 
the homefield advantage parameter are given.


The estimates for the posterior means and standard deviations are simply 
the sample means and sample standard deviations over the 100,000 iterates 
of the algorithm described in the last section.  The estimates for the 
posterior probabilities that each team has rank 1 through 5, meanwhile, 
are the proportions of the 100,000 iterates for which the team's merit 
parameter had the rank in question.


To save space, each season's tables are given in a separate link.
1990 1991
1992 1993
1994 1995
1996 1997
1998 1999
2000 2001
2002  


The teams most likely to have been the true best:


The results given in the last section suggest that there are typically 2 
to 4 teams with at least a 10% chance of being the true best team.  There 
may be as many as 12 to 14 teams with at least a 1% chance of being the 
best team.

 
Here are the teams, by year, with at least 10% chances of being the true 
best team.  The World Series winner is also included.  The ordering tends 
to follow very closely the order of win totals, though there is some 
variation in recent years as a result of unbalanced schedules.


1990 Oakland        54.3%
1990 Pittsburgh     14.1%
1990 Chicago(AL)    11.1%
1990 Cincinnati      5.2% (WS champ)

1991 Pittsburgh     29.3%
1991 Minnesota      20.8% (WS champ)
1991 Atlanta        13.9%
1991 Los Angeles    11.6%

1992 Atlanta        23.1%
1992 Oakland        18.0%
1992 Toronto        17.5% (WS champ)
1992 Pittsburgh     16.9%

1993 Atlanta        34.1%
1993 San Francisco  29.7%
1993 Toronto         9.5% (WS champ)

1994 Montreal       36.2%
1994 New York(AL)   17.7%
1994 Atlanta        10.3%

1995 Cleveland      67.6%
1995 Atlanta        17.7% (WS champ)

1996 Cleveland      36.5%
1996 Atlanta        20.0%
1996 New York(AL)    8.7% (WS champ)

1997 Atlanta        45.2%
1997 Baltimore      18.2%
1997 New York(AL)   12.1%
1997 Florida         8.6% (WS champ)

1998 New York(AL)   68.0% (WS champ)
1998 Atlanta        18.6%

1999 Atlanta        32.8%
1999 Arizona        19.6%
1999 New York(AL)    8.6% (WS champ)

2000 San Francisco  19.7%
2000 Chicago(AL)    17.2%
2000 Oakland        10.4%
2000 St. Louis      10.2%
2000 New York(AL)    3.5% (WS champ)

2001 Seattle        86.9%
2001 Arizona         0.8% (WS champ)

2002 Oakland        27.0%
2002 Atlanta        19.3%
2002 New York(AL)   17.6%
2002 Anaheim        10.6% (WS champ)


The lowest percentages for World Series winners are 0.8% for Arizona in 
2001 and 3.5% for New York(AL) in 2000.  

Acknowledgement

The game result data used here was obtained free of charge from and is 
copyrighted by Retrosheet. Interested parties may contact Retrosheet at 
20 Sunset Rd., Newark, DE 19711.
A Method for Determining the Probability that a Given Team was the True Best Team in Some Particular Year by Jesse Frey

Introduction:

The Model:

The Results:

The teams most likely to have been the true best:

Acknowledgement