Luck has been the popular explanation whenever a pitcher has a significantly lower ERA than his FIP. Two statistics that extrapolate on the role of luck are BABIP and LOB%. Using Steve Staude’s pitching stat correlation tool, we can see that BABIP only has a correlation of 0.156 from one season to the next, while LOB% has a correlation of 0.205, for pitchers with a minimum of 30 innings pitched from 2007 to 2013. These numbers are much lower than the correlation of K% or BB%, suggesting that a large portion of BABIP and LOB% are subject to random variation and independent of a pitcher’s skill. However, the correlation is not 0. They are not completely random, and a pitcher can still play a role in controlling their BABIP and LOB%. Many writers, including Steve, have tackled the issue of BABIP using batted ball data. In this article, I will be estimating a pitcher’s LOB% for the current season. This is not supposed to be a predictive stat, but a descriptive one. Think of it as FIP. While FIP estimates the pitcher’s ERA using strikeouts, walks and homeruns, xLOB% estimates the pitcher’s LOB% given his other pitching statistics for the same season. I will be introducing pLOB% in the next article, which attempts to project LOB% of a pitcher for the following season.

First, take a look at which statistics correlate most closely to LOB%. Again, I am using Steve’ pitching stat correlation tool and setting the minimum innings pitched at 30 from 2007 to 2013.

Correlation with current year LOB% Correlation with next year LOB%
BABIP -0.452 -0.127
GB% -0.050 -0.047
FB% 0.103 0.059
LD% -0.135 -0.030
PU% (Popup%) 0.166 0.106
HR/FB -0.131 -0.135
HR/TBF -0.138 -0.157
K% 0.421 0.348
BB% -0.037 0.052
HBP% -0.034 0.013
O-Swing% 0.246 0.169
Z-Swing% -0.040 -0.057
Swing% 0.146 0.077
O-Contact% -0.163 -0.165
Z-Contact% -0.332 -0.311
Contact% -0.331 -0.307
Zone% -0.046 -0.034
SwStr% 0.345 0.302
Foul% 0.311 0.256
rSB 0.062 0.009
rPM 0.045 0.001
LOB% 1 0.205

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Looking at the first column, a few stats stand out as strongly correlated with LOB%. BABIP has the strongest correlation with LOB%, at -0.452. This makes perfect sense as a pitcher who gives up a lot of hits would have more of his base runners score. K% comes next at 0.421. This also makes sense as a strikeout does not advance the runner, and high-strikeout pitchers should be able to strand more runners without subjecting themselves to the whims of BABIP. Next comes a series of stats that are highly correlated with K%, namely SwStr%, Z-contact%, contact%, O-swing%. Foul%, which has a correlation of 0.311 with LOB%, initially caught me by surprise. However, a deeper look reveals that it has a correlation of 0.708 with K%, so it does not add much additional information. Both HR/FB and HR/TBF have a fairly strong negative association with LOB%, which should have been expected as homeruns score all the base runners. What surprises me the most is BB%, which has only a -0.037 correlation with LOB%. I did not know what I was expecting before the study, but I probably expected a stronger association, either positive or negative. Now that I think about it, a walk can be positively associated with LOB% because it is the least dangerous form of a base runner, compared to a single or an extra-base hit. It does not advance the runners already on base as much as hits, and the batter only reaches first base after a walk. A walk can also be negatively associated with LOB% because it still advances the base runners and makes them easier to score after the walk. The two factors seem to cancel out each other, and BB% does not seem to have a strong association with LOB%. I also tested the fielding statistics, but they do not appear to have strong associations with LOB%.

Using multiple regression, my model for xLOB% = 0.87 – 0.76 BABIP + 0.42 K%. The R-squared value is 31.8%. The standard error is 0.0574, or 5.74%, suggesting that xLOB% differs from LOB% by 5.74% on average. O-swing%, rSB, FB% and HR/TBF are all significant variables in the model at α = 0.05. However, none of these variables add more than 1% to R-squared value, so I decided to omit them in the model to maintain its simplicity.

Testing out of sample, using data from 2002-2006 with a minimum of 30 innings pitched, xLOB% has a correlation of 0.573 with LOB%. This is very close to the correlation coefficient of 0.564 between xLOB% and LOB% in the data from 2007-2013, suggesting the relationship between BABIP+K% and LOB% is not a quirk of the data from 2007-2013.

How does xLOB% perform as a predictor? Not so well. Using data from 2007-2013, xLOB% has a correlation of 0.299 with LOB% of the following season. This is a lower correlation coefficient than K% has with LOB% of the following season alone (0.348). The reason behind the relative uselessness of xLOB% as a predictor is that BABIP is very inconsistent from year to year. xLOB% itself only has a correlation of 0.463 from year to year, which is similar to the correlation coefficient of PU%, but much lower than that of K% or BB% from year to year. So how can LOB% be predicted? I’ll tackle that topic in my next article.

All statistics courtesy of Fangraphs

Featured Image courtesy of www.rantsports.com