Does the length of the top of the first inning affect the number of runs scored in the bottom of the first? (somewhat!)

(this is a followup of Why are so many runs scored in the bottom of the first inning? – you may want to read that first!)

After hearing the excellent suggestion that the unusually high number of runs in the bottom of the first inning might be caused by the fact that the visiting pitcher gets to warmup and then has to wait to throw their first “real” pitches, I decided to dig more into this. One way to check this is to see if when the top of the first inning goes longer, more runs are scored in the bottom of the first.

I liked this because in theory it’s entirely independent – the quality of the home batters and visiting pitcher shouldn’t affect the quality of the home pitcher and visiting batters at all. So hopefully the data will be pretty clean.

One complication is that the Retrosheet data doesn’t include the time length of innings, so we’ll have to look at a reasonable proxy. I decided to look at number of batters faced in the top of the first as well as number of pitches thrown, since both of these seem like they should work decently.

First I wanted to figure out what we would expect the number of runs scored in the bottom of the first to be. We already know it’s higher than the average number of runs across all innings, because the first inning is higher-scoring for both teams (presumably because the top of the lineup is at bat), and also because the home team scores a little bit more than the visiting team because of the home field advantage.

Since the visiting team scores 0.506 runs in the top of the first, and on average the home team scores 8.6% more than the visiting team in innings 2-8, I would expect the home team to score 0.549 runs in the bottom of the first independent of whatever’s going on. So that’s our baseline here.

As usual, I wrote some more Rust code to calculate this, and here’s what I got for the number of batters:

This is…not very convincing! If you look closely there does seem to be a general upward trend, but it’s pretty noisy. Let’s look at number of pitches thrown instead:

Now we’re talking! There’s a clear upward trend here, from 0.588 runs when 0-7 pitches are thrown up to 0.633 runs when 40-47 pitches are thrown.

So, I’m pretty convinced that the length of the top of the first inning does have an effect on the runs scored in the bottom of the first. Pretty neat! But this clearly isn’t the whole explanation – even in the shortest case the runs in the bottom of the first are significantly higher than I would expect.

One possibility is that there is some nonlinearity here and maybe the home field advantage and having the top of the lineup up combine in a different way than my rough calculation above. Another possibility is that even a very short inning is still long enough to affect the visiting pitcher.

Odds and ends:

  • To avoid small sample size issues I didn’t include batters/pitches buckets where less than 1000 games were played.
  • Because of the way the Retrosheet games are stored the number of batters can be overcounted if there is a wild pitch, etc.
  • Full disclosure: the way you bucket the number of pitches together does seem to make a difference – bucketing by 8 works pretty well, but bucketing by 6 or by 10 don’t work quite as well.
  • When I was running the script I noticed a pretty big outlier where the top of the 1st had 18 batters and 88 pitches thrown! Behold, the June 21, 1994 game between the Red Sox and the Blue Jays. Let’s check in on what the Red Sox did in the first inning:
    • walk
    • walk
    • double
    • walk
    • double
    • strikeout
    • double
    • (pitching change)
    • (passed ball)
    • (wild pitch)
    • walk
    • single
    • walk
    • double
    • double
    • groundout
    • intentional walk
    • walk (scoring a run)
    • (Red Sox decide to give Mo Vaughn the rest of the day off)
    • groundout
  • Here’s the source code for the new StatsRunExpectancyForBottomFirstInningByNumberBattersReport, and here are the output analysis files.

More articles written with data from the Baseball Win Expectancy Finder: