|
YEAR-END REVIEW: PART 3
published Friday, December 19, 2008
by David Luciani
In the first couple of parts of this series, we looked at some of my favorite
(and least favorite) forecasts from the pre-2008 season. As I explained in
the first two parts, it's really not a useful way to look at forecasts.
Every set is going to have some luck in it and also its share of failures and
it's far too easy to focus on a handful of projections that jump out at
you. I try to be as fair as possible when I do this sort of year-end
exercise because it's nice
to show people where you were good and/or lucky and it's often tough to focus
on complete misses in the projections.
I maintain that the best way to look at a forecast set is to focus on aggregate
performance of the entire set. This summary of the forecasts tells you our exact
"score" for the year and doesn't exclude any player. If a player appeared
for whom we didn't have a forecast, we treat it as if we forecasted all zeroes
for that player because that's the way we've told readers how to read the
set. I've said it before that one of the things that makes our forecast
sets unique is that we don't just forecast 800 players but we do our best to
forecast every player we think has a chance to appear in the majors.
Admittedly, we always miss some (e.g. on Opening Day, we didn't think San
Diego's Edgar Gonzalez was going to appear in the majors at all in 2008 and so
we've got 325 at bats
working against us in our overall score).
What I've said in old editions of this annual review remains true as we
revive the format for public consumption. There are many reasons I
do this annual exercise but one fringe benefit is that it gives me a place of
some confidence to speak from when readers write to voice their disagreement with a
forecast because "Player X" has done something "the past three
years" and I'm forecasting them to not continue that trend. It's
beneficial and useful to be able to tell a reader that while they may be correct
about a specific forecast being off, and they often prove to be, we can say with mathematical
certainty that on an aggregate level, our forecasts have historically averaged being more accurate than using any combination of previous
year stats or multi-year averages. No
doubt, some forecasts will be easy to tag as a miss but for the overall
population, they'll land within our expected margin of error.
I do want to emphasize, though, that we make absolutely no claims to
be more accurate than any other site. It's always turned me off when sites
do that. This is one method of rating forecasts that I have proposed would
provide a rating system but there are so many things that would be necessary to
implement even if this method was a good one and accepted. First, there
would need to be an independent party to oversee collection of forecast sets
before the season started. Second, each set would need to be published
approximately at the same time. Obviously, the closer you get to Opening
Day, the more accurate information you have about roles and major
injuries. Beyond that, there would have to be acceptance of the standard
deviation method itself. Perhaps there is some better method out there
that we have missed for rating forecasts.
If you’re wondering precisely how accurate our
forecasts are, we use what is called a “standard deviation” model to rate
the whole set in each category. I started using this scoring method back maybe
around the year 2001 and I like it better than any other method because it captures the
overall essence of the forecasts and doesn’t unfairly reward us for projecting
established players and then failing to project others or worse, completely
disregarding or omitting the players who have no track records. I recall one
website boasting along the lines of “for 65% of the baseball players that we
forecast (interesting that they get to pick which players they would like to use
to rate their forecasts), we forecast within two stolen bases of the player’s eventual
outcome.” My response to that is simple: That’s not very good. In fact,
forecast everyone in baseball from Frank Thomas to Jose Reyes to every
rookie who’s about to make his debut to steal exactly two bases and if you do
that, you’ll be within two steals on about 70-75% of the baseball population.
That was easy and I didn’t even have to work at it. It will be meaningless and
certainly not useful but you’ll be able to boast about being within a margin
on a certain percentage of the population, a portion you as the forecaster get
to pick, and you’ll create forecasts that are inferior to those you could have
had if you had simply used the previous year’s final statistics.
The standard deviation method is relatively simple. We take the 2008 final statistics
and we take the statistics for every player who appeared in the majors, regardless of whether
they were in the forecast set. If we didn’t forecast them to play, we treat
the player as if we had forecasted a line of zeroes for that player. We then calculate
what is called the “standard deviation” in each category which you can read
about in any beginning level statistics book. What’s wonderful about the
“standard deviation” is it means something clear to those who know how to
use it. For example, if we know our standard deviation, it means that
about 68% of forecasts will ultimately prove to end up within one standard
deviation of the eventual outcome and about 95% of results will ultimately prove
to end up within two standard deviations of the forecast. So, if our standard
deviation is +/- 6, it means that 68% of actual results will end up within 6
home runs of our forecast and 95% of forecasts will end up within 12 home runs
of the number the player eventually hits. At first, standard deviation may sound
like something only accountants or economists would use but it’s
an extremely powerful tool. It helps us to create low and high ends for player
expectations and be reasonably accurate in our expectations in either direction.
Here then is our standard deviation for the major stat categories and how it shapes up when compared to how you would have done using previous
year averages. For this exercise, we are using our Opening Day 2008
set. In many places, we're showing the deviation to one decimal place and
occasionally, to help distinguish what would otherwise be a tie, a second
decimal place. For those who like to use these multi-averages, we've
highlighted which multi-year average would have yielded the lowest standard
deviation, reminding readers that this varies from year to year:
Hitters
|
Category
|
Standard
Deviation Using 2007 Statistics
|
Standard
Deviation Using 2 Year Avg
|
Standard
Deviation Using 3 Year Avg
|
Standard
Deviation Using BN Projections
|
|
Games
|
+/- 44
|
+/- 47
|
+/- 50
|
+/- 39
|
|
At Bats
|
+/- 154
|
+/- 165
|
+/- 176
|
+/- 128
|
|
Hits
|
+/- 44
|
+/- 47
|
+/- 50
|
+/- 37
|
|
Doubles
|
+/- 10
|
+/- 11
|
+/- 11
|
+/- 8.65
|
|
Triples
|
+/- 2.0
|
+/- 1.9
|
+/- 2.5
|
+/- 1.61
|
|
Home Runs
|
+/- 6.5
|
+/- 6.8
|
+/- 7.2
|
+/- 5.72
|
|
Runs
|
+/- 24
|
+/- 25
|
+/- 27
|
+/- 21
|
|
Runs Batted In
|
+/- 23.6
|
+/- 24.4
|
+/- 26
|
+/- 20
|
|
Walks
|
+/- 17
|
+/- 18
|
+/- 20
|
+/- 14
|
|
Strikeouts
|
+/- 32
|
+/- 34
|
+/- 37
|
+/- 26
|
|
Stolen Bases
|
+/- 6.0
|
+/- 6.25
|
+/- 6.31
|
+/- 5.7
|
|
Caught Stealing
|
+/- 2.07
|
+/- 2.13
|
+/- 2.2
|
+/- 1.9
|
|
Errors
|
+/- 4.2
|
+/- 4.0
|
+/- 4.1
|
+/- 3.5
|
Looking at the above, in all cases, our forecasts were more reliable than
using any three year, two year or single year average in terms of the deviation.
One reason previous year stats usually turn out to be the best choice among
historical numbers could be because playing time is so important to the equation
of how all other categories will do. That is, the most recent role a
player has will often be the best indicator of how much playing time he'll get
in the new season. Remember too
that if a player appeared whom we did not expect to appear, we are treating him
as though we projected an entire line of 0's. Deviations may not seem important but if 95% of forecasts are going to prove to
be accurate within two standard deviations, that means that, for example, in the
at bats column, we’ve been able to tighten the expectation in at bats by about
52 at bats at each end of our margin of error over what you would have had if
you had used last year's stats.
Let’s see how we did in the pitching categories:
Pitchers
|
Category
|
Standard
Deviation Using 2003 Statistics
|
Standard
Deviation Using 2 Year Avg
|
Standard
Deviation Using 3 Year Avg
|
Standard
Deviation Using BN Projections
|
|
Games
|
+/- 23
|
+/- 22
|
+/- 22
|
+/- 20
|
|
Games Started
|
+/- 8.52
|
+/- 8.49
|
+/- 8.9
|
+/- 7.2
|
|
Innings Pitched
|
+/- 52.9
|
+/- 53.0
|
+/- 56
|
+/- 46
|
|
Hits Allowed
|
+/- 53.8
|
+/- 53.6
|
+/- 56
|
+/- 46
|
|
Home Runs Allowed
|
+/- 6.8
|
+/- 6.5
|
+/- 6.7
|
+/- 5.5
|
|
Earned Runs Allowed
|
+/- 26.0
|
+/- 25
|
+/- 26.3
|
+/- 22
|
|
Walks
|
+/- 20.2
|
+/- 19.9
|
+/- 20.5
|
+/- 18
|
|
Strikeouts
|
+/- 41.7
|
+/- 41.6
|
+/- 43
|
+/- 37
|
|
Hit Batters
|
+/- 3.1
|
+/- 3.01
|
+/- 2.97
|
+/- 2.7
|
|
Wins
|
+/- 4.0
|
+/- 3.89
|
+/- 3.94
|
+/- 3.5
|
|
Losses
|
+/- 3.7
|
+/- 3.5
|
+/- 3.6
|
+/- 3.1
|
|
Saves
|
+/- 5.5
|
+/- 5.03
|
+/- 5.02
|
+/- 4.1
|
|
Holds
|
+/- 5.8
|
+/- 5.5
|
+/- 5.3
|
+/- 5.1
|
One thing I find particularly interesting here is that two year averages were
a better forecaster for pitchers than single year numbers, unlike the
hitters. In any case, our numbers again had lower standard deviations than
any of the historical choices available in all of these categories. In the
interest of full disclosure, one category we don't publish on these sheets
(blown saves) would have been beaten by one tenth of a decimal place using the
single year 2007 numbers. We will have a category that does not exceed one
of the historical averages every few years or so but it happens that this year,
it fell in one of the categories we usually wouldn't list so I wanted to still
mention it since it didn't jump out on the sheet. Of course, the problem
for the forecaster is knowing which category will be that miss, whether
it will even be such a year where a category doesn't do as well as an
available historical average and of course, knowing which historical
average to use among single year, two year or three year averages if you think
there will be a miss. We've never found a pattern here and it's just one
of those things that is usually the result of us entirely missing a player who
ended up having a significant role in that particular category. The blown
saves category happened to be this year's but did better than all historical
categories a year earlier (our biggest variance this year was for Ryan Franklin,
who had 0 projected blown saves and ended up with 8 - we didn't see that
coming!). Our previous category that didn't perform as well as any single
available historical average was triples for batters in 2006, then bettered by a
two year historical average. Before that, you have to go all the way back
to the year 1999, when our caught stealing total was bettered by one of the
multi-year averages.
Though we've only returned to publishing these charts this year, we actually
have continued to do this sort of analysis internally every year for our own
development and always have. It helps us to isolate problem areas in the
forecasts and to continue our mission of constantly improving the
forecasts. Some readers indicated an interest in seeing the actual numbers
and so we're glad to share it.
Let me take this opportunity to extend my best wishes to all of our readers in this holiday season.
Early in the new year,
we'll be talking about sleepers, prospects and many of our most contentious new
projections. Until then, take care! - DL
|