|
Projection Sets That Shouldn
Projection Sets That Shouldn't Add Up
by David Luciani
Published December 23, 2001
A couple of years ago, I advocated that a projection set should
"add up" so that each team has the right number of at bats, wins and
losses, etc. Our analysis of projections has continued to evolve quite a
bit since then and we made an interesting discovery in the last two years of
post-season review, one that had we used last year would have netted an extra
5-10% in accuracy on our 2001 forecasts (and our forecasts already were very
accurate as we demonstrated in our year-end review). We will be using this
improvement for our 2002 forecasts and so it is worth examining here.
We discovered that often the set that doesn't quite add up is
the one that ultimately proves to be the most accurate in terms of forecasting
individual outcomes as opposed to team outcomes. In other words, keeping
the forecast in mind in relation to the individual often has better results than
if you try to over-compensate for his team. It took a while to isolate the
cause-effect here but by doing so, we went back and applied this extra model to
our old forecasts and found that the deviation in every category would have been
improved significantly.
A good example of this can be demonstrated from the following
real example from 2001: You think that you have 162 games started to go
around for each team's Tampa Bay Devil Rays' pitching staff from Opening Day
last year. But you are wrong! Among the pitchers who were in the
Devil Rays' organization on Opening Day of 2001 who would eventually start a
major league game in 2001, there were actually 181 games started to go around
rather than 162. This may seem confusing but in reality it reveals an
incredible reality about forecasting.
We can forecast injuries, to some extent, and we have a lot of
data on roles and depth charts that help us refine a forecast. What's
missing is a degree of certainty as to whether a player will be traded during
the upcoming season. It doesn't seem to make sense but it is true and the
solution can be found with Albie Lopez (and to a minor extent Mike Judd).
Lopez started the year with the Devil Rays and was traded to Arizona in
mid-season. Lopez started 20 games for Tampa Bay and 13 games for Arizona
and so his total games started in 2001 was 33. Whether you saw the trade
coming, the best forecasted games total for Lopez on April 1st, the perfect
forecast in fact, was 33 games started. Not all of his starts were
destined to come with Tampa Bay but that's what a perfect forecast would look
like. Once Lopez was gone from Tampa Bay, it freed up his spot in the
rotation for other lesser known pitchers to step up and get their opportunity.
Indeed, the perfect forecast set, the one where we are exactly
right about every player, for all Devil Rays' pitchers on April 1st would have
181 games started distributed among all the pitchers. Conversely, as it
happens to work out, you didn't get even 162 games started out of all pitchers
who were Arizona Diamondbacks on April 1st. It forces us then to clarify
in our own minds what it means when we write the team name next to the player's
forecast. What we are saying is:
Albie Lopez, Tampa Bay Devil Ray on April
1st.
We are not forecasting the Devil Rays, with individual player
forecasts, to have 181 games started. Even with an occasional rainout or
tie-breaking playoff game, a team is going to be between 160-163 games almost
always. It clarifies in our mind that with individual player forecasts, we
are forecasting individuals who just happen to be with a certain team on Opening
Day.
The analysis that led us to this Tampa Bay example revealed
hundreds of others like this and showed us that though team adjustments are
necessary, that the most accurate forecasts, the ones with the lowest average
deviation, aren't going to add up perfectly for each team and even for each
league. All of baseball doesn't need to add up either to get the best
possible results for individual players. For example pitchers on AL teams
as of April 1st ended up averaging 170 games started per team in 2001, whereas
the average NL team didn't yet have 162 games started in their organization on
April 1st, 2001... but that's a topic for another essay.
|