How effective are teams at manufacturing runs?

To calculate how likely a baseball team is to win from a given situation, there are two main approaches:

  1. You can gather a bunch of data about how likely singles, doubles, home runs, etc. are, and use that to build a model to calculate win expectancy.
  2. Or, you can just look at each situation in each game and count how many times each team wins. This is the approach the Baseball Win Expectancy Finder uses.

One downside to the second approach is that you need a bunch of data (more than in the first approach), and you’re always going to find nonsensical results due to small sample sizes. For example, if the home team is up by 4 runs in the bottom of the 7th inning with 2 outs, if there’s a runner on third there’s a 98.2% chance the home team will win. But if the batter walks, there’s only a 98.0% chance the home team will win! (notice the small sample sizes…)

But the advantage is that you can see some interesting stuff that wouldn’t be captured by a model. One example is Why are so many runs scored in the bottom of the first inning? – today let’s look at another example!


Starting in the 2020 season, each extra inning starts with a “ghost runner” on second base. The idea is that it will make both teams more likely to score and thus will shorten the game, and looking at 2020 data shows that the average length of an extra inning game did go down a bit. But I suspect a side effect of this is that home teams will have even more of an advantage in extra innings. Visiting teams have to try to score as many runs as they can, but home teams will know exactly how many runs they need to tie or win the game.

As of this writing there have only been 166 extra inning games with the “ghost runner”1, so there’s not nearly enough data to know2. But we can look at a reasonable proxy – if a team knows they need exactly one run, are they more likely to score that run?

My plan to look at this was to look at the bottom of the 9th inning. Here the home team is in a similar situation as they are in extra innings; they know exactly how many runs they need to tie or win the game. So if the home team is able to score a run more often when the score is tied than when they’re down by 2 runs, this is evidence that they’re able to “manufacture” a run3.


So I generated a bunch of data – let’s take a look!

First, let’s look at how often the home team is able to score at least one run in the bottom of the ninth inning depending on the score. The (9, true part indicates it’s the 9th inning, bottom part, and the third member of the tuple is how much the home team is ahead or behind at the start of the inning. (in this case they’re always tied or behind; otherwise the game would be over!)4

(9, true, -8): Score 1+ runs: 27.66% (1956 tries)
(9, true, -7): Score 1+ runs: 26.30% (2954 tries)
(9, true, -6): Score 1+ runs: 27.63% (4068 tries)
(9, true, -5): Score 1+ runs: 26.99% (5683 tries)
(9, true, -4): Score 1+ runs: 25.12% (7604 tries)
(9, true, -3): Score 1+ runs: 24.39% (9429 tries)
(9, true, -2): Score 1+ runs: 24.49% (11503 tries)
(9, true, -1): Score 1+ runs: 24.31% (12972 tries)
(9, true, 0): Score 1+ runs: 28.15% (12938 tries)

And there we have it – with a good sample size, the home team is almost 4% more likely to score a run when the game is tied than when they’re down by 1 or 2 runs. This isn’t a huge effect, but it is convincing to me. (it also makes sense that when the home team is down by 5+ runs they’re more likely to score, because the visiting team probably put in a pitcher who’s worse than their closer)

I am a little surprised that we don’t see this effect when the home team is down by 1 run; I would expect them to try to tie the game! But we can see this in a number of places; although it’s not the end of the game, here’s what happens in the top of the ninth inning:

(9, false, -6): Score 1+ runs: 24.69% (4426 tries)
(9, false, -5): Score 1+ runs: 24.60% (6119 tries)
(9, false, -4): Score 1+ runs: 24.16% (8081 tries)
(9, false, -3): Score 1+ runs: 23.44% (10027 tries)
(9, false, -2): Score 1+ runs: 22.83% (12166 tries)
(9, false, -1): Score 1+ runs: 23.19% (13597 tries)
(9, false, 0): Score 1+ runs: 25.40% (13637 tries)
(9, false, 1): Score 1+ runs: 24.52% (13094 tries)
(9, false, 2): Score 1+ runs: 25.11% (11189 tries)
(9, false, 3): Score 1+ runs: 27.21% (8842 tries)
(9, false, 4): Score 1+ runs: 26.70% (6715 tries)

Here we see a similar effect – there’s a jump of around 2% between the visiting team being down by 1 or 2 runs to a tie game. But why, if the visiting team is ahead, are they still more likely to score runs? Perhaps if a team is up by one or two runs, they may still be in a “we just want to score one more run” mindset?

The effect shows up in the eighth inning as well, although it’s smaller – here it is for the bottom of the eighth:

(8, true, -4): Score 1+ runs: 26.39% (7519 tries)
(8, true, -3): Score 1+ runs: 26.29% (9808 tries)
(8, true, -2): Score 1+ runs: 25.82% (12170 tries)
(8, true, -1): Score 1+ runs: 26.29% (14011 tries)
(8, true, 0): Score 1+ runs: 28.02% (14620 tries)
(8, true, 1): Score 1+ runs: 28.16% (13777 tries)
(8, true, 2): Score 1+ runs: 26.88% (11553 tries)
(8, true, 3): Score 1+ runs: 27.57% (9093 tries)

and for the top of the eighth:

(8, false, -4): Score 1+ runs: 25.08% (7728 tries)
(8, false, -3): Score 1+ runs: 24.50% (9991 tries)
(8, false, -2): Score 1+ runs: 24.32% (12505 tries)
(8, false, -1): Score 1+ runs: 24.89% (14775 tries)
(8, false, 0): Score 1+ runs: 26.11% (15513 tries)
(8, false, 1): Score 1+ runs: 26.45% (14063 tries)
(8, false, 2): Score 1+ runs: 26.82% (11622 tries)
(8, false, 3): Score 1+ runs: 27.29% (8748 tries)

And as a sanity check, if we go back to the seventh inning, there’s no effect:

(7, true, -4): Score 1+ runs: 27.19% (7228 tries)
(7, true, -3): Score 1+ runs: 27.14% (9748 tries)
(7, true, -2): Score 1+ runs: 26.48% (12842 tries)
(7, true, -1): Score 1+ runs: 27.11% (15387 tries)
(7, true, 0): Score 1+ runs: 27.17% (16449 tries)
(7, true, 1): Score 1+ runs: 27.71% (14824 tries)
(7, true, 2): Score 1+ runs: 28.89% (11970 tries)
(7, true, 3): Score 1+ runs: 29.07% (8983 tries)

I also also looked at specific situations – for example, if there’s a runner on first base and no outs, do we still see this effect? And the answer still seems to be yes for the bottom of the ninth:

(9, true, -4): Score 1+ runs: 39.77% (1906 tries)
(9, true, -3): Score 1+ runs: 38.42% (2418 tries)
(9, true, -2): Score 1+ runs: 38.93% (2946 tries)
(9, true, -1): Score 1+ runs: 37.65% (3495 tries)
(9, true, 0): Score 1+ runs: 41.96% (3701 tries)

and for the top of the ninth as well:

(9, false, -4): Score 1+ runs: 36.14% (1998 tries)
(9, false, -3): Score 1+ runs: 37.55% (2426 tries)
(9, false, -2): Score 1+ runs: 37.80% (3016 tries)
(9, false, -1): Score 1+ runs: 37.92% (3510 tries)
(9, false, 0): Score 1+ runs: 40.26% (3666 tries)
(9, false, 1): Score 1+ runs: 39.87% (3331 tries)
(9, false, 2): Score 1+ runs: 41.29% (2768 tries)
(9, false, 3): Score 1+ runs: 43.24% (2352 tries)

Of course, what I’m really interested in is the “ghost runner” situation of having a runner on second base and no outs. Unfortunately the sample sizes here are too small to be reliable, but there might be an effect still!

Odds and ends:

  • Edit: Ben P on Facebook pointed out that traditionally if the game is tied in the bottom of the ninth inning, the visiting team won’t use their closer, so this probably explains some of this effect!
  • I also looked at how likely it was that if a team needed exactly two runs, they’d score two runs, but I didn’t see any effect.
  • You can look at the data for this here. As mentioned above, each entry is a tuple of something like (9, true, -2), indicating that this is the ninth inning, it’s the bottom of the inning (false means top of the inning), and the team at bat is behind by 2 runs. The syntax for the filenames is either:
    • reportruns<n>.txt – the probability that the team scores n runs from the start of the inning
    • reportrunners<r>outs<o>runs<n>.txt – the probability that the team scores n runs from a situation where there are o outs and runners at r bases.
  • When I started this, I worried that when home teams were down by 2 runs in the bottom of the ninth inning, they might be even more likely to score a run because once a runner gets on base, there’s little reason for the visiting team to stop that runner from scoring. But there’s no evidence this happens!
  • Here’s the source code for the new StatsScoreAnyRunsByInningAndScoreDiffReport.

More articles written with data from the Baseball Win Expectancy Finder:


Footnotes:

1 And this is overcounting a little because this includes playoff games, in which the “ghost runner” is not in effect. (back)

2 Whether a team wins in extra innings can be represented by a binomial distribution. Since 1995 home teams have won around 52% of extra inning games. Let’s say think that having a ghost runner increases that to 55% (which would be a big effect!) – if n games have been played, the standard deviation of the observed number of games won would be sqrt(n)*sqrt(.55*.45), or around 0.5*sqrt(n), so the standard deviation of the fraction of games won would be 0.5/sqrt(n). If we wanted to prove that the 55% was different from our baseline 52%, that means we’d need something like two standard deviations between them, so the standard deviation would have to be .015, which means we’d need on the order of 1100 games. Since around 200 extra-inning games are played each season, this is another 5 seasons worth of games! (full disclosure: I’m not 100% sure the statistics here are right, but I am sure 166 games is not nearly enough…) (back)

3 One way a team might be able to increase their chance of scoring a run is by putting in a pinch hitter, but I would expect them to do this even if they’re behind by 2 or 3 runs. When I’m talking about manufacturing runs, I mean things that increase their chances of scoring at least one run but probably decrease their chances of scoring more than one run; for example, sacrifice bunts or sacrifice flies. These are things you don’t want to do most of the time because it decreases the expected number of runs you will score, but if you really only care about scoring one run, they become much more attractive! (back)

4 You can see the whole output here. (back)

Parenting Beyond Pink & Blue: How to Raise Your Kids Free of Gender Stereotypes review

Parenting Beyond Pink & Blue: How to Raise Your Kids Free of Gender StereotypesParenting Beyond Pink & Blue: How to Raise Your Kids Free of Gender Stereotypes by Christia Spears Brown

My rating: 4 of 5 stars


This book simultaneously made me feel better and worse about being a parent. Better because I feel much more justified in some of the stuff we’ve tried to do for our kids. Worse because, ack, gender roles and sexism is everywhere and how are we supposed to protect our kids from the whole dang culture? But I’m glad I read it and there are some helpful tips about how to do your best to influence your kids without being too militant about it.

One interesting parallel I took away is that it’s not enough to just say “everyone can do the same things”; you have to specifically call out when someone says something stereotypical and rebut it by saying “boys and girls can be firefighters” (or whatever). Also, you can’t just not say anything about it because kids will pick up gender stereotypes from other kids, TV, random adults, etc, so you have to fight against these. This reminds me a lot of the newer research on racism, where you have to actively be antiracist instead of just saying general things like “everyone is equal”.

Odds and ends:

  • Kids pick up on the fact that gender is important, and then will start to overgeneralize based on that. Brown gives an example of her daughter saying out of the blue that boys are messy and girls are neat (even though her father is the neatest person in the house!)
  • A study was done where teacher were told to use gender to organize their classroom. (children had name cards of pink or blue, they lined up boy-girl-boy-girl, etc.) Even though the teachers treated the boys and girls equally and didn’t express any stereotypes, students developed stronger gender stereotypes by themselves than those in a classroom where the teachers were told to ignore gender.
  • A similar study was done but instead of gender, kids were randomly assigned to the red or blue group and then teachers used groups to organize the classrooms in a similar way. And lo and behold, kids developed stereotypes about the red and blue groups! But kids in a classroom where kids were in groups but teachers didn’t talk about them, the kids didn’t develop those stereotypes.
  • A big one: when kids hear “he” used as a generic term, they assume it only refers to boys. Same for “fireman” and “policeman”; kids assume only boys can be firefighters or police officers. This is a very hard habit to break, although I have managed to make some progress for myself! They’ve done studies where parents look at animal picture books, they use “he” 95% of the time.
  • Brown says, somewhat depressingly, you only have until your kid is three years old to try to avoid using stereotypes; after that, the stereotypes are ingrained, and the best thing to do is to tackle them head-on. (more on how to do this later)
  • Brown was interviewing a group of high-achieving women undergraduates and asked them to raise their hand if they felt insecure about their math abilities – and everyone raised their hand! (and they were kind of surprised everyone felt the same way) Brown then bet that if he asked a similar group of men, no one would raise their hand. So they walked down the hall, found a group of men in a classroom, and lo and behold, none of them raised their handπŸ™‚
  • There are some differences between boys and girls, and there’s an interesting discussion of effect size. Basically an effect size is a measure of how much two populations differ between each other relative to how much they differ within themselves. So for examples, boys are considered to be much more active than girls. But the effect size is only 0.21, meaning that if you have a boy who is of average “activeness”, 42 percent of girls are more active than him. Yes, that’s less than 50 percent, but it really doesn’t tell you much about any particular child. Almost all gender differences are of this magnitude or less.
  • Brown says that when she hears her kid say a gender stereotype, even if it’s just strange (her kid said one day that girls have eyelashes and boys don’t), she just says two things: point out that both genders do have whatever the statement was, then point out a concrete example of someone breaking the stereotype. (“Daddy has long eyelashes!”) And then stop talking about it πŸ™‚
  • It’s important to encourage your kid to do whatever activities they want, even if they’re not “gender stereotypical”. If you’re more encouraging and engaged, the kid will enjoy it more and want to do it, and that will lead to them getting better at it.
  • There’s a section about self-esteem in girls (which, sigh), but I didn’t realize that African-American girls have better self-esteem and body image and less depression than white girls. This seems to be because African-American girls have more positive relationships with their mothers, and their mothers encourage their independence more.
  • “Stereotype threat” is a depressing occurrence where just reminding kids what gender they are can trigger stereotypes that cause them to perform worse. For example, just having girls fill out their gender on the front of a math test causes them to do worse on it, because of the stereotype that girls are worse at math. Yikes! Here are Brown’s eight tips to help protect your child from stereotype threat:
    • De-emphasize gender: try to make them think about their other characteristics (about being a third-grader, or a member of their school or family, for example)
    • Reframe the task: remind them that it’s just a test and not a true measure of their full ability.
    • Discuss stereotype threat: teach kids that it’s normal to feel anxious when they are taking a test.
    • Encourage self-affirmation: have your child think about values, skills, and characteristics that are important to them and write about them.
    • Emphasize high standards, and assure kids they are capable of meeting them
    • Provide competent role models: point out women who excel in math, or boys who excel at writing. (these can be fictional characters!)
    • Provide alternative explanations for anxiety: like the item above, tell them it’s normal to feel anxious and it will go away over time
    • Teach that intelligence comes from trying hard, rather than innate talents: this is the whole thing about praising kids for working hard and not for “being smart”, just like the Punished by Rewards book says, or the whole “growth mindset” thing.
  • Brown summarizes things by saying the three things she really tries to do:
    • Get rid of a lot of toys that are stereotypical. She especially calls out Barbie and similar dolls, and shirts with depressingly stereotypical sayings on them (like “I love shopping”)
    • Alter the language you use with your kids – don’t say “pretty girls” or “big girls”, just say “kids” or “big kids”. When you’re talking about someone else, try to pick a descriptive label that doesn’t involve gender instead of just calling them a man or woman, unless it’s particularly relevant for some reason.
    • Stop kids from using their own stereotypes and correct them. Even when other adults say something stereotypical, tell kids that the stereotype is wrong in private after the fact.
  • Brown also says three assumptions to try to avoid:
    • Don’t assume toys and movies/TV are just for fun. They influence kids!
    • Don’t assume you don’t have any influence on your kids and surrender to the media and culture – you still have an impact!
    • Don’t assume anything about your child solely on the basis of gender.

      View all my reviews

Does the length of the top of the first inning affect the number of runs scored in the bottom of the first? (somewhat!)

(this is a followup of Why are so many runs scored in the bottom of the first inning? – you may want to read that first!)

After hearing the excellent suggestion that the unusually high number of runs in the bottom of the first inning might be caused by the fact that the visiting pitcher gets to warmup and then has to wait to throw their first “real” pitches, I decided to dig more into this. One way to check this is to see if when the top of the first inning goes longer, more runs are scored in the bottom of the first.

I liked this because in theory it’s entirely independent – the quality of the home batters and visiting pitcher shouldn’t affect the quality of the home pitcher and visiting batters at all. So hopefully the data will be pretty clean.

One complication is that the Retrosheet data doesn’t include the time length of innings, so we’ll have to look at a reasonable proxy. I decided to look at number of batters faced in the top of the first as well as number of pitches thrown, since both of these seem like they should work decently.

First I wanted to figure out what we would expect the number of runs scored in the bottom of the first to be. We already know it’s higher than the average number of runs across all innings, because the first inning is higher-scoring for both teams (presumably because the top of the lineup is at bat), and also because the home team scores a little bit more than the visiting team because of the home field advantage.

Since the visiting team scores 0.506 runs in the top of the first, and on average the home team scores 8.6% more than the visiting team in innings 2-8, I would expect the home team to score 0.549 runs in the bottom of the first independent of whatever’s going on. So that’s our baseline here.

As usual, I wrote some more Rust code to calculate this, and here’s what I got for the number of batters:

This is…not very convincing! If you look closely there does seem to be a general upward trend, but it’s pretty noisy. Let’s look at number of pitches thrown instead:

Now we’re talking! There’s a clear upward trend here, from 0.588 runs when 0-7 pitches are thrown up to 0.633 runs when 40-47 pitches are thrown.

So, I’m pretty convinced that the length of the top of the first inning does have an effect on the runs scored in the bottom of the first. Pretty neat! But this clearly isn’t the whole explanation – even in the shortest case the runs in the bottom of the first are significantly higher than I would expect.

One possibility is that there is some nonlinearity here and maybe the home field advantage and having the top of the lineup up combine in a different way than my rough calculation above. Another possibility is that even a very short inning is still long enough to affect the visiting pitcher.

Odds and ends:

  • To avoid small sample size issues I didn’t include batters/pitches buckets where less than 1000 games were played.
  • Because of the way the Retrosheet games are stored the number of batters can be overcounted if there is a wild pitch, etc.
  • Full disclosure: the way you bucket the number of pitches together does seem to make a difference – bucketing by 8 works pretty well, but bucketing by 6 or by 10 don’t work quite as well.
  • When I was running the script I noticed a pretty big outlier where the top of the 1st had 18 batters and 88 pitches thrown! Behold, the June 21, 1994 game between the Red Sox and the Blue Jays. Let’s check in on what the Red Sox did in the first inning:
    • walk
    • walk
    • double
    • walk
    • double
    • strikeout
    • double
    • (pitching change)
    • (passed ball)
    • (wild pitch)
    • walk
    • single
    • walk
    • double
    • double
    • groundout
    • intentional walk
    • walk (scoring a run)
    • (Red Sox decide to give Mo Vaughn the rest of the day off)
    • groundout
  • Here’s the source code for the new StatsRunExpectancyForBottomFirstInningByNumberBattersReport, and here are the output analysis files.

More articles written with data from the Baseball Win Expectancy Finder:

I’m Waiting for You and Other Stories review

I'm Waiting for You and Other StoriesI’m Waiting for You and Other Stories by Kim Bo-young

My rating: 4 of 5 stars


I really really liked the first and last stories. I just could not get into the second and third ones, although even so I was impressed by the world-building. Part of the problem is that I was expecting shorter stories, but I wouldn’t call four stories in 300+ pages “short”. (and it’s maybe not a coincidence that the first and last stories are shorter than average!)

(and if you read this, definitely read the reader’s notes for the first and last stories to see how they came into being – very sweet!)

It was neat to read “foreign” sci-fi, especially after I couldn’t bring myself to read “The Three Body Problem” for ethical reasons (since the author supports the Xinjiang re-education camps). Will definitely seek out more stories by Kim Bo-young in the future, but Ted Chiang is still my favorite sci-fi short story author πŸ™‚



View all my reviews