|
New Page 1
Size as a Statistical Field: Part I
by David Luciani
published January 19, 2005
Occasionally I might pull out a forecast that ultimately proves
to be right despite expectations and more importantly, despite what readers
would consider to be statistical data supporting the initial forecast.
This seems to puzzle some readers who write to ask how someone as
statistically-oriented as I am would have projected the player to do so well
when there was no data to support it. In a similar spirit, a former big
league GM told me that the problem with relying heavily on statistics, as I no
doubt do, is that it can't capture or consider things like how fast a pitcher
throws and the size of a player or his potential size. I argued quite the
opposite, that these very things were statistics and could be classified as such
where height and weight and other such things were a so-called "statistical
field" that we could analyze like any other. No doubt, a player's
projected eventual height and weight (with weight being the more likely to
change except in the youngest of players) is something you need an expert, which
we'll call a scout, to project but the simple fact is that if you can quantify
it either in a numerical or categorical way, then it is already a
statistic. If there is a so-called "secret" to my forecasting
methods, and I don't think I've been secretive about it, is that while I do
emphasize statistics in my forecasting, it's just that I'm looking at a player's
performance in a different way than a person looking in from the outside of
statistical analysis might have expected. I tried to highlight an example
of this in this past week's "Ask David" when I gave a bit of insight
into my Melvin Mora forecast for 2005 though even it is an oversimplification to
attempt to respond to the reader question in a single essay.
Admittedly, I tend to be overly defensive of those of us in the
so-called statistical crowd, though I no doubt can be considered a crossover at
times in that I often include scouting data in my forecasts but I quantify it to
make it statistical. This way I know what percentage of hitters who get an
"A+" for power in my ratings actually do end up hitting home runs, for
example. It is far too common for those on the extreme scouting side of
the debate to quickly conclude that the statistics that we're looking at exclude
much of the meaningful, good data they collect. In fact, I not only
support its collection but argue that it is crucial to the forecasting exercise,
particularly the long-term forecasting of players who are five or more years out
of their eventual prime ability.
The size of a player in particular has always fascinated me and
I have begun to compile some summary data for a new book I am writing that
attempts to explain how it is I do forecasting. Some readers have
questioned how it is that I can suddenly downgrade a player's projected stolen
bases when he shows up for spring training far overweight. Just because
it's for a book, I see no reason to hold back useful data from Baseball
Notebook's readers because it applies to much of the information you see here at
the site.
Unlike analyzing for the effects of minor league competition
levels, parks, age and other elements we've already succeeded at neutralizing,
the problem I have in analyzing the effect of the actual size of a player is
that there isn't good data readily available on the changes that take place in a
player's size within a season, particularly his weight. Unfortunately,
there isn't even data on how much weight has changed over player careers and the
reason I prefer to see the changes within a season is because then I am not
confusing the effect of some other changing element, such as age, with the
effect of the change in a player's size. So, I approach this summary of
data with both caution and a caveat: This is not the way I would normally
prefer to analyze data but it's the best I can do with the data I have
available.
I finally decided that since I don't have the data I want, the
best I can do until I acquire that data to is try to summarize the statistical
performances of all players at different heights and weights to see if any
patterns emerge. Would the taller pitchers have better fastballs because
of their increased leverage? Would the heavier hitters be more likely to
be home run hitters because of their strength and ability to resist the incoming
fastball? If a former stolen base champ shows up for spring training
looking like he lived off a winter of double cheese burgers, will he be able to
run as he always has?
While these questions remain open, I think the reader will
greatly benefit from seeing some of the far too obvious patterns that emerged in
this, only the first round of a much-needed lengthier study on the issue.
Because I don't have the changing weight (and possibly changing height for very
young players) over a player's career, I had to settle for the listed heights
and weights available in commonly published sources and databases. That
means I don't have reliable data for every player like a Babe Ruth who started
as a much skinner young pitcher and developed into a heavier and more powerful
hitter through his career. In rare cases, I was able to plug in those
known changes from biographical information but in general I just didn't have
that data.
Ruth isn't a factor here anyway as I also decided to set the
cutoff for players considered from 1969 forward so we were essentially looking
at the current game, conceding that the 1970s is somewhat different than the
1990s but similar enough that I wanted to consider it. There's more than
thirty years of data to work with here and it did yield some obvious patterns
for purposes of this summary.
Finally, I decided to present all data in the form of either
"per plate appearance" for hitters or "per batter faced" for
pitchers because it makes it entirely easy to compare the different groups.
In part one, we'll look at hitting and next time out, I'll share
with you some of the data we found for pitchers.
Let's start with just weight for a hitter. I took all
hitting data for all players since (and including) the 1969 season and split out
performances based on the weights. I decided to break weight into twelve
distinct categories as follows. There is plenty of data for each group and
some of the peak groups, like the 190-199 range, have more than eight thousand
seasonal performances considered:
149 lbs or less
150-159
160-169
170-179
180-189
190-199
200-209
210-219
220-229
230-239
240-249
250+
Here then is a summary of how each group performed per 550 plate
appearances. The averages, OBP and SLG columns are more precise than the
rounded numbers you see displayed here and they use the actual non-rounded
totals:
| Weight Group |
AB |
H |
2b |
3b |
HR |
HBP |
BB |
K |
SB |
CS |
SH |
SF |
Avg |
Obp |
Slg |
| <=149 lbs |
489 |
126 |
19 |
5 |
3 |
2 |
46 |
73 |
26 |
10 |
10 |
3 |
.258 |
.324 |
.336 |
| 150-159 |
497 |
129 |
19 |
4 |
4 |
3 |
38 |
51 |
19 |
8 |
9 |
3 |
.259 |
.314 |
.338 |
| 160-169 |
489 |
126 |
20 |
4 |
6 |
3 |
47 |
65 |
17 |
7 |
8 |
4 |
.259 |
.324 |
.351 |
| 170-179 |
492 |
128 |
22 |
4 |
8 |
3 |
45 |
72 |
14 |
6 |
7 |
4 |
.261 |
.324 |
.370 |
| 180-189 |
490 |
127 |
22 |
3 |
11 |
4 |
47 |
77 |
12 |
6 |
6 |
4 |
.259 |
.326 |
.383 |
| 190-199 |
488 |
128 |
23 |
3 |
13 |
3 |
49 |
82 |
8 |
4 |
5 |
4 |
.263 |
.332 |
.405 |
| 200-209 |
488 |
127 |
23 |
3 |
16 |
3 |
50 |
90 |
6 |
3 |
5 |
4 |
.260 |
.329 |
.416 |
| 210-219 |
488 |
125 |
23 |
2 |
18 |
3 |
49 |
99 |
5 |
3 |
5 |
4 |
.256 |
.326 |
.423 |
| 220-229 |
485 |
126 |
24 |
2 |
20 |
4 |
53 |
98 |
4 |
3 |
4 |
4 |
.260 |
.335 |
.438 |
| 230-239 |
488 |
128 |
24 |
2 |
20 |
5 |
47 |
111 |
5 |
3 |
6 |
4 |
.263 |
.331 |
.444 |
| 240-249 |
477 |
125 |
23 |
2 |
24 |
3 |
61 |
113 |
5 |
3 |
3 |
5 |
.262 |
.346 |
.469 |
| 250+ |
457 |
131 |
24 |
1 |
27 |
3 |
83 |
90 |
2 |
1 |
1 |
6 |
.288 |
.397 |
.522 |
You can't help but look at the chart and be amazed at the obvious and in some
cases not surprising patterns that emerge. The heavier a player gets, the
more his home run skill improves but just as importantly, his walk ability goes
up. One reason the walk ability could be improving is because as a player
becomes more powerful, pitchers are understandably more reluctant to pitch to
him and thus try to nibble at the corners or are so careful as to throw many
more balls than strikes than they would if he were a light hitter. The
increase in the doubles column isn't as profound. The stolen base column
is not only obvious but it's critical to appreciate if you're to be a good
forecaster. Basically, the heavier a player becomes, the less frequently,
on average, he's going to steal bases compared to what his skills would look
like if he were lighter. The batting average actually slightly increased
as the player got heavier but this pattern isn't so obvious. Triples,
clearly, drop. The light hitters, probably because they aren't considered
to be as important to the offense, are asked to bunt more often. The big
hitters, by virtue of their power, get more sacrifice flies to go with their
home runs. In terms of strikeouts, the heavier you get, the more there is
a tendency to strike out more but the pattern there is a bit sketchy.
A thinking reader might question why it is that clubs don't actually ask
players to gain weight because it is clear that the greater value performances
are coming from the heavier players. Well, they often do and in other
cases, they shouldn't. Remember that the player has to field a position
and this analysis in no way shows you the negative effects a weight gain would
have on a player's range, a negative defensive effect that could more than
offset any offensive gain from the player. I do think that weight gain
shouldn't be as frowned on as it is, especially from defensive positions that
don't require the same range like first base but in general, there are many
other components to a player's game and a weight gain isn't an instant recipe
for success.
Now players can control their weight but they can't control their
height. Again, I've broken out the heights into categories as follows,
this time thirteen groups that ensure we get many seasonal performances in each
group:
5'6" or less
5'7"
5'8"
5'9"
5'10"
5'11"
6'
6'1"
6'2"
6'3"
6'4"
6'5"
6'6"+
Again, here's the summary of each group per 550 plate
appearances:
| Height
Group |
AB |
H |
2b |
3b |
HR |
HBP |
BB |
K |
SB |
CS |
SH |
SF |
Avg |
Obp |
Slg |
| <=5'6" |
496 |
120 |
17 |
4 |
3 |
4 |
37 |
67 |
21 |
8 |
10 |
3 |
.241 |
.298 |
.311 |
| 5'7" |
470 |
124 |
23 |
3 |
9 |
3 |
68 |
63 |
22 |
7 |
5 |
4 |
.265 |
.358 |
.383 |
| 5'8" |
487 |
132 |
21 |
5 |
7 |
4 |
49 |
64 |
21 |
8 |
7 |
4 |
.270 |
.339 |
.376 |
| 5'9" |
490 |
129 |
21 |
3 |
7 |
4 |
46 |
64 |
17 |
7 |
6 |
4 |
.262 |
.328 |
.360 |
| 5'10" |
488 |
127 |
21 |
4 |
9 |
3 |
48 |
69 |
14 |
6 |
6 |
4 |
.260 |
.328 |
.371 |
| 5'11" |
490 |
129 |
22 |
4 |
9 |
3 |
47 |
69 |
12 |
6 |
6 |
4 |
.264 |
.329 |
.379 |
| 6' |
489 |
129 |
23 |
3 |
12 |
3 |
48 |
76 |
10 |
5 |
5 |
4 |
.263 |
.331 |
.394 |
| 6'1" |
490 |
126 |
22 |
3 |
13 |
3 |
47 |
86 |
8 |
4 |
5 |
4 |
.258 |
.325 |
.398 |
| 6'2" |
489 |
128 |
23 |
3 |
14 |
3 |
49 |
85 |
8 |
4 |
5 |
4 |
.261 |
.330 |
.409 |
| 6'3" |
489 |
126 |
23 |
3 |
17 |
3 |
47 |
96 |
7 |
4 |
6 |
4 |
.258 |
.325 |
.418 |
| 6'4" |
486 |
122 |
23 |
2 |
16 |
3 |
49 |
104 |
5 |
3 |
7 |
4 |
.251 |
.321 |
.407 |
| 6'5" |
483 |
121 |
22 |
2 |
17 |
3 |
51 |
108 |
6 |
4 |
9 |
4 |
.251 |
.324 |
.411 |
| 6'6"+ |
484 |
117 |
20 |
2 |
22 |
3 |
50 |
123 |
7 |
3 |
9 |
4 |
.242 |
.314 |
.428 |
Again, you can see obvious patterns though I believe height is still less of
a factor than weight and of course, both work in combination with each other to
create the package we call the player. In this case, as a player gets
taller his power improves and surprisingly, his walk ability improves but his
OBP does decline. With an increased strike zone working against him, it's
possible that he gets into deeper pitcher's counts and has to swing at pitches
he doesn't like, leading to the lower batting average but that is difficult to
reconcile with the increase in walks. It's also possible that the taller
players simply aren't players who try to hit for average, preferring the home
run to the single. It's also difficult to tell from this chart how much
height is really impacting things because, on average, the taller a player is,
the heavier he is and so weight can skew these results.
In any case, it's undeniable that weight and height are factors in a player's
performance and if we treat them as a statistical field, we can and should apply
them to our forecasts. Next time out, I'll show you some performance data
for the pitchers.
|