How effective are teams at manufacturing runs?

To calculate how likely a baseball team is to win from a given situation, there are two main approaches:

  1. You can gather a bunch of data about how likely singles, doubles, home runs, etc. are, and use that to build a model to calculate win expectancy.
  2. Or, you can just look at each situation in each game and count how many times each team wins. This is the approach the Baseball Win Expectancy Finder uses.

One downside to the second approach is that you need a bunch of data (more than in the first approach), and you’re always going to find nonsensical results due to small sample sizes. For example, if the home team is up by 4 runs in the bottom of the 7th inning with 2 outs, if there’s a runner on third there’s a 98.2% chance the home team will win. But if the batter walks, there’s only a 98.0% chance the home team will win! (notice the small sample sizes…)

But the advantage is that you can see some interesting stuff that wouldn’t be captured by a model. One example is Why are so many runs scored in the bottom of the first inning? – today let’s look at another example!


Starting in the 2020 season, each extra inning starts with a “ghost runner” on second base. The idea is that it will make both teams more likely to score and thus will shorten the game, and looking at 2020 data shows that the average length of an extra inning game did go down a bit. But I suspect a side effect of this is that home teams will have even more of an advantage in extra innings. Visiting teams have to try to score as many runs as they can, but home teams will know exactly how many runs they need to tie or win the game.

As of this writing there have only been 166 extra inning games with the “ghost runner”1, so there’s not nearly enough data to know2. But we can look at a reasonable proxy – if a team knows they need exactly one run, are they more likely to score that run?

My plan to look at this was to look at the bottom of the 9th inning. Here the home team is in a similar situation as they are in extra innings; they know exactly how many runs they need to tie or win the game. So if the home team is able to score a run more often when the score is tied than when they’re down by 2 runs, this is evidence that they’re able to “manufacture” a run3.


So I generated a bunch of data – let’s take a look!

First, let’s look at how often the home team is able to score at least one run in the bottom of the ninth inning depending on the score. The (9, true part indicates it’s the 9th inning, bottom part, and the third member of the tuple is how much the home team is ahead or behind at the start of the inning. (in this case they’re always tied or behind; otherwise the game would be over!)4

(9, true, -8): Score 1+ runs: 27.66% (1956 tries)
(9, true, -7): Score 1+ runs: 26.30% (2954 tries)
(9, true, -6): Score 1+ runs: 27.63% (4068 tries)
(9, true, -5): Score 1+ runs: 26.99% (5683 tries)
(9, true, -4): Score 1+ runs: 25.12% (7604 tries)
(9, true, -3): Score 1+ runs: 24.39% (9429 tries)
(9, true, -2): Score 1+ runs: 24.49% (11503 tries)
(9, true, -1): Score 1+ runs: 24.31% (12972 tries)
(9, true, 0): Score 1+ runs: 28.15% (12938 tries)

And there we have it – with a good sample size, the home team is almost 4% more likely to score a run when the game is tied than when they’re down by 1 or 2 runs. This isn’t a huge effect, but it is convincing to me. (it also makes sense that when the home team is down by 5+ runs they’re more likely to score, because the visiting team probably put in a pitcher who’s worse than their closer)

I am a little surprised that we don’t see this effect when the home team is down by 1 run; I would expect them to try to tie the game! But we can see this in a number of places; although it’s not the end of the game, here’s what happens in the top of the ninth inning:

(9, false, -6): Score 1+ runs: 24.69% (4426 tries)
(9, false, -5): Score 1+ runs: 24.60% (6119 tries)
(9, false, -4): Score 1+ runs: 24.16% (8081 tries)
(9, false, -3): Score 1+ runs: 23.44% (10027 tries)
(9, false, -2): Score 1+ runs: 22.83% (12166 tries)
(9, false, -1): Score 1+ runs: 23.19% (13597 tries)
(9, false, 0): Score 1+ runs: 25.40% (13637 tries)
(9, false, 1): Score 1+ runs: 24.52% (13094 tries)
(9, false, 2): Score 1+ runs: 25.11% (11189 tries)
(9, false, 3): Score 1+ runs: 27.21% (8842 tries)
(9, false, 4): Score 1+ runs: 26.70% (6715 tries)

Here we see a similar effect – there’s a jump of around 2% between the visiting team being down by 1 or 2 runs to a tie game. But why, if the visiting team is ahead, are they still more likely to score runs? Perhaps if a team is up by one or two runs, they may still be in a “we just want to score one more run” mindset?

The effect shows up in the eighth inning as well, although it’s smaller – here it is for the bottom of the eighth:

(8, true, -4): Score 1+ runs: 26.39% (7519 tries)
(8, true, -3): Score 1+ runs: 26.29% (9808 tries)
(8, true, -2): Score 1+ runs: 25.82% (12170 tries)
(8, true, -1): Score 1+ runs: 26.29% (14011 tries)
(8, true, 0): Score 1+ runs: 28.02% (14620 tries)
(8, true, 1): Score 1+ runs: 28.16% (13777 tries)
(8, true, 2): Score 1+ runs: 26.88% (11553 tries)
(8, true, 3): Score 1+ runs: 27.57% (9093 tries)

and for the top of the eighth:

(8, false, -4): Score 1+ runs: 25.08% (7728 tries)
(8, false, -3): Score 1+ runs: 24.50% (9991 tries)
(8, false, -2): Score 1+ runs: 24.32% (12505 tries)
(8, false, -1): Score 1+ runs: 24.89% (14775 tries)
(8, false, 0): Score 1+ runs: 26.11% (15513 tries)
(8, false, 1): Score 1+ runs: 26.45% (14063 tries)
(8, false, 2): Score 1+ runs: 26.82% (11622 tries)
(8, false, 3): Score 1+ runs: 27.29% (8748 tries)

And as a sanity check, if we go back to the seventh inning, there’s no effect:

(7, true, -4): Score 1+ runs: 27.19% (7228 tries)
(7, true, -3): Score 1+ runs: 27.14% (9748 tries)
(7, true, -2): Score 1+ runs: 26.48% (12842 tries)
(7, true, -1): Score 1+ runs: 27.11% (15387 tries)
(7, true, 0): Score 1+ runs: 27.17% (16449 tries)
(7, true, 1): Score 1+ runs: 27.71% (14824 tries)
(7, true, 2): Score 1+ runs: 28.89% (11970 tries)
(7, true, 3): Score 1+ runs: 29.07% (8983 tries)

I also also looked at specific situations – for example, if there’s a runner on first base and no outs, do we still see this effect? And the answer still seems to be yes for the bottom of the ninth:

(9, true, -4): Score 1+ runs: 39.77% (1906 tries)
(9, true, -3): Score 1+ runs: 38.42% (2418 tries)
(9, true, -2): Score 1+ runs: 38.93% (2946 tries)
(9, true, -1): Score 1+ runs: 37.65% (3495 tries)
(9, true, 0): Score 1+ runs: 41.96% (3701 tries)

and for the top of the ninth as well:

(9, false, -4): Score 1+ runs: 36.14% (1998 tries)
(9, false, -3): Score 1+ runs: 37.55% (2426 tries)
(9, false, -2): Score 1+ runs: 37.80% (3016 tries)
(9, false, -1): Score 1+ runs: 37.92% (3510 tries)
(9, false, 0): Score 1+ runs: 40.26% (3666 tries)
(9, false, 1): Score 1+ runs: 39.87% (3331 tries)
(9, false, 2): Score 1+ runs: 41.29% (2768 tries)
(9, false, 3): Score 1+ runs: 43.24% (2352 tries)

Of course, what I’m really interested in is the “ghost runner” situation of having a runner on second base and no outs. Unfortunately the sample sizes here are too small to be reliable, but there might be an effect still!

Odds and ends:

  • Edit: Ben P on Facebook pointed out that traditionally if the game is tied in the bottom of the ninth inning, the visiting team won’t use their closer, so this probably explains some of this effect!
  • I also looked at how likely it was that if a team needed exactly two runs, they’d score two runs, but I didn’t see any effect.
  • You can look at the data for this here. As mentioned above, each entry is a tuple of something like (9, true, -2), indicating that this is the ninth inning, it’s the bottom of the inning (false means top of the inning), and the team at bat is behind by 2 runs. The syntax for the filenames is either:
    • reportruns<n>.txt – the probability that the team scores n runs from the start of the inning
    • reportrunners<r>outs<o>runs<n>.txt – the probability that the team scores n runs from a situation where there are o outs and runners at r bases.
  • When I started this, I worried that when home teams were down by 2 runs in the bottom of the ninth inning, they might be even more likely to score a run because once a runner gets on base, there’s little reason for the visiting team to stop that runner from scoring. But there’s no evidence this happens!
  • Here’s the source code for the new StatsScoreAnyRunsByInningAndScoreDiffReport.

More articles written with data from the Baseball Win Expectancy Finder:


Footnotes:

1 And this is overcounting a little because this includes playoff games, in which the “ghost runner” is not in effect. (back)

2 Whether a team wins in extra innings can be represented by a binomial distribution. Since 1995 home teams have won around 52% of extra inning games. Let’s say think that having a ghost runner increases that to 55% (which would be a big effect!) – if n games have been played, the standard deviation of the observed number of games won would be sqrt(n)*sqrt(.55*.45), or around 0.5*sqrt(n), so the standard deviation of the fraction of games won would be 0.5/sqrt(n). If we wanted to prove that the 55% was different from our baseline 52%, that means we’d need something like two standard deviations between them, so the standard deviation would have to be .015, which means we’d need on the order of 1100 games. Since around 200 extra-inning games are played each season, this is another 5 seasons worth of games! (full disclosure: I’m not 100% sure the statistics here are right, but I am sure 166 games is not nearly enough…) (back)

3 One way a team might be able to increase their chance of scoring a run is by putting in a pinch hitter, but I would expect them to do this even if they’re behind by 2 or 3 runs. When I’m talking about manufacturing runs, I mean things that increase their chances of scoring at least one run but probably decrease their chances of scoring more than one run; for example, sacrifice bunts or sacrifice flies. These are things you don’t want to do most of the time because it decreases the expected number of runs you will score, but if you really only care about scoring one run, they become much more attractive! (back)

4 You can see the whole output here. (back)

6 thoughts on “How effective are teams at manufacturing runs?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s