Why are so many runs scored in the bottom of the first inning?

After starting to look at some inning-by-inning data from my baseball win expectancy finder for another project, I stumbled across something weird that I can’t explain. Here’s a graph of expected runs scored per inning:

Check out how high the bottom of the first inning is – on average 0.6 runs are scored compared with 0.5 runs in the top of the first. That’s a huge difference! Here’s a graph of the difference:

Holy outlier, Batman! So what’s going on? Here are some ideas:

  • Teams score more in the first inning because the top of the lineup is at bat – this is true! You can see in the top graph that the expected runs scored in first inning is the highest for both the home and visiting teams. (see this Beyond the Box Score article that discusses this) But that doesn’t nearly explain why the home team does so much better than the visiting team!
  • Starting pitchers are more likely to have a terrible first inning – This might be true, but I can’t think of any reason why this would affect visiting starting pitchers more than home starting pitchers. I also made a graph of the home advantage for each number of runs scored for the first and third inning (I picked the third inning because that’s the second-greatest difference between home and visitor):

To me, these look almost exactly the same shape, so it’s not like the first inning has way more 6 run innings or anything.

  • This is just random chance – I guess that’s possible, but the effect seems large given that the data has more 130,000 games.
  • There’s a bug in my code – I’ve been writing code for 20 years, and let me tell you: this is certainly possible! In fact, I found a bug in handling walkoff innings in the existing runs per inning code after seeing some weird results in this investigation. But it would be weird to have a bug that just affects the bottom of the 1st inning, since it isn’t at the start or end of the game. I also implemented it in both Rust and Python, and the results match. But feel free to check – the Rust version is StatsRunExpectancyPerInningByInningReport in reports.rs, and the Python version is StatsRunExpectancyPerInningByInningReport in parseretrosheet.py.
  • This is different between baseball eras – I don’t know why this would be true, but it was easy enough to test out, and the difference is pretty consistent. (see the raw data)
  • The fact that home teams are usually better in the playoffs bias this – I think this is a tiny bit true, but I reran the numbers with only regular season games (where the better team has no correlation with whether it’s the home or visiting team) and the difference looks almost exactly the same.

So, in conclusion, I don’t know! If anyone has any ideas, I’d love to hear them on this post or on Twitter.

Edit: Ryan Pai suggested on Facebook that the visiting pitcher has to wait a while between warming out and pitching in the bottom of the 1st, which is an intriguing theory!

Odds and ends:

  • That top “expected runs per inning” graph has some other neat properties – for example you can see that the 2nd inning is the lowest scoring inning, presumably because something near the bottom of the lineup is usually up.
  • Another thing you can see is how robust the home field advantage is. In every inning the home team scores, on average, a little more than the visiting team!
  • The graph only shows 8 innings because in the 9th inning things get complicated. For one thing, the bottom of the 9th inning only happens if the home team is behind or tied, which biases the sample somewhat. Also, if the game is tied and the home team hits a leadoff home run, they win the game but lose the opportunity to score any more runs.
  • You can also notice the strangeness of the bottom of the 1st inning another way. If you look at the chance that the home team will win when the game is tied, their chances are better at the beginning of the bottom of the 9th than the bottom of the 8th, because they have an extra chance to bat. That advantage gets lower the earlier in the game you go, with one exception. In the bottom of the 1st, the home team has ~59% chance to win, but in the bottom of the 2nd that goes down to ~58%! The reason is that if the home team misses their chance to score runs in the bottom of the 1st they’ve missed a big opportunity, apparently!
  • The raw report data is here in the GitHub repo.

State Election Map – fixing the “votes to change the election” calculation

As I mentioned at the end of my “Lucky” book review, the book said that Trump in 2020 was only 43K votes away from winning the election. In my 2020 updates to the State Election Map, I had also this calculation, but came up with a different number. (76K) I already knew there was one problem that I hadn’t addressed, which was breaking the vote down by congressional district for Maine and Nebraska, so decided to bite the bullet and fix it up.

Along the way I discovered there were another issue causing wrong results: the difference between an electoral vote tie and a win can be substantial in some cases, and I was just calculating for a win. (in the case of a tie the vote goes to the House of Representatives, but each state gets a vote so it’s complicated, but in 2020 Trump probably would have one)

Anyway, I used Dave Leip’s Atlas of U.S. Presidential Elections to gather the data by congressional district, and improved the Rust code to do both of those calculations, and made the Javascript code show both calculations when they’re different. And lo and behold, Trump really was only 43K votes away from winning in 2020!

Here’s the commit for these changes. It was a lot more work than I expected, and I’m actually in the middle of another project that I put on pause because I was so irritated these numbers were wrong ๐Ÿ™‚

As an aside, the Rust code to calculate this got a bit more complicated and the more that happens the more I sprinkle in .clone() calls just to make things work. Which is fine for correctness but not great for performance. In this case it’s fine, though, the script still runs in ~1 second, and I guess it’s nice to have markers where I can try to optimize things if necessary later!

Adding 2020 baseball games to the win expectancy finder

You can see the results on the win expectancy finder. (although adding one year of data doesn’t change much, it’s the principle of the thing!) Updated apps will be coming soon.

Usually adding a year’s worth of games is a pretty quick task; run the scripts, and update a few things on the web page. (thankfully I made a list of what to do a few years back) But this year was different because of the rule changes in 2020. Not only that but now that I have two versions of the parsing script (the faster one in Rust and the original in Python) I wanted to keep both scripts up to date. And it ended up being quite a journey!

The rule changes in 2020 were:

  • in extra inning games, a runner starts on second base
  • for doubleheaders, the game only went 7 innings instead of 9

This didn’t sound too hard, but it meant I had to add a set of rules by which to parse the game. The first one was pretty easy (since I just had to know whether the game was in 2020 or not), but I was worried about figuring out whether a game was a doubleheader or not. Luckily Retrosheet added the number of innings to their event file format, and in fact also added whether a runner starts on second base in extra innings, which I didn’t discover until later and should probably go back and use!

Once I got all the games parsing in Rust, then the fun began:

  • I took a quick look at the resulting statistics and noticed that the situation at the start of a game (top of the 1st, bases empty, etc.) had around 700 new games, which sounded reasonable, and less than 100 of them had the visiting team winning, which did not! After some thought (and coming back to it the next day), I found and fixed the bug, which had to do with what the final game situation was as opposed to the last actual game situation was; you can see the fix in this commit.
  • So then I made similar changes in Python, and after running the script the results were off by exactly one game. (just like last time; what are the odds?) Anyway, by looking at the differences in the stats file I could see what situations the mystery game went through, and added a special Report type to find the game that went through those situations. It turns out only one playoff game in 2020 went into extra innings, and the Python script was handling that wrong, and it was pretty easy to fix!
  • That led me to discover that the Python script wasn’t throwing an exception by default if it failed to parse a game, which is bad, so I fixed that in this commit. (notice my Rust style of not putting parentheses around if conditions is starting to slip into my Python style…)
  • Running the Python script showed a major difference in the runsperinningstats file – in fact, the Rust script had never been updating it! The fix was a simple copy/paste error, and I made a later change to use “Self” instead of explicit type names to avoid some of these problems in the future.
    • So how did I never notice this before? The way I validated my Rust script when I was developing it was to run it and see if the results differed from what was in git. This has the now-obvious consequence that if the script didn’t do anything, it would seem like it was working! I guess the lesson I took away from this is, don’t do that ๐Ÿ™‚
  • Both scripts print out the number of games parsed at the end, and I noticed when I was debugging some of these problems that the numbers were slightly different between Python and Rust. There are 7 games that the scripts can’t parse correctly and I list them explicitly in both scripts (the event files seem wrong to me) so we can skip them, and the Rust script was correctly not counting them while the Python script was counting them.
  • The way to actually update everything is getting ridiculous – as I mentioned above I have it written down, but there are 16 steps to run! I really need to make this easier…

parsing baseball files in Rust instead of Python for an 8x speedup!

linked to by This Week in Rust! also see Hacker News discussion here.

Since I’ve been quickly becoming a Rustacean (Rust enthusiast) – see previous posts here – I decided to take a crack at the parser for my baseball win expectancy finder. It’s written in Python, and when I made it parse files in parallel (previous writeup) it sped up processing time by ~4x. I thought doing it in Rust would run even faster, although a lot of the work is in regular expressions which I didn’t think would show a big difference between the two languages.

Here’s the PR for the change, and I guess less time is spent in regular expression than I thought, because here’s a table of the time it takes to parse ~130,000 baseball games (all MLB games from 1957-2019):

Time (in seconds)
Python single core implementation165
Python multicore implementation35
Rust single core implementation (commit e76de817)20
Rust multicore implementation4 (!)
Time to parse 130,000 baseball games on my desktop i5-8600K CPU (which has 6 cores)

So that’s about an 8x speedup in both single core and multicore code! And the final product runs in between 4-5 seconds, which is just unbelievably fast.

As always, I learned a lot about Rust along the way:

  • Rust makes me want to write more performant code. This shouldn’t be a surprise, because performance is one of Rust’s raison d’รชtre’s (it’s first on the “Why Rust?” list at rust-lang.org!), but every time I have to call collect() or clone(), I think about if I really need to do it. It’s much different than when I’m writing code in Python!
  • That being said, I didn’t do anything major algorithmically to speed things up – the biggest thing was probably keeping the mapping of which runners end up at which base (RunnerDests in the Rust code) in an array instead of a HashMap. But I’m sure that I avoided a ton of copies/allocations along the way.
  • One of the nicest little ways to speed things up is the entry() method on HashMap. If the passed-in key is present in the map it will return a reference to the value, otherwise you can call something like or_insert() to insert a value. This is nice because you only have to do the lookup one time!
  • Still a huge fan of the ? operator which makes it very simple to propagate errors up. This time I used the anyhow crate to make it easy to return String errors, although I didn’t really take advantage of its ability to attach context to errors as I got a bit lazy. Maybe next time!
  • The Rust single core implementation in debug mode took 673 seconds, which is 33x slower than in release mode. This is my usual reminder to never benchmark anything in debug mode!
  • The nice thing about this project is that, other than writing tests for parsing tricky plays, you can just run the parser on all the games and see if the resulting stats files match. After implementing the report for win expectancy including the balls/strikes count, I was dismayed to see the stats files was wrong – there was one game somewhere (out of 130,000!) that wasn’t using the balls/strikes count correctly. Of course, I could have narrowed it down by only running the parser on certain years and going from there, but luckily after looking at the Python code more closely I realized that it handled cases where the pitches were lowercase, and the Rust code did not, which was easy to fix. I guess there’s one plate appearance somewhere that has lowercase pitches!
  • I did try some optimizations by using SmallVec (see commit dd6ed7a) to store small vectors on the stack instead of using heap allocations. It did seem to help a little bit – the single core runtime went from 20 seconds to 19 seconds, although I’m not 100% sure that’s significant. I also used smol_str (see commit 48b47dfe) to do the same thing for strings after verifying that most of the strings in files were 22 characters or less, although again it didn’t show much/any improvement.
  • I also went ahead and rewrote the script that the web app calls to look up the data in Rust. I’m still clearly slower at writing Rust code than Python code – it took me a little over an hour when I already had a working Python script to look at. I assume it also runs faster than the Python one but they’re both fast enough so I didn’t bother to benchmark it.
  • Like with the clue solver and population centers projects, I used Rayon for the multicore implementation, which worked pretty well. One complaint I have is that I had to create a new copy of each report for each file we process and then merge them all together. Ideally I would just create one copy per thread since each thread can safely update its own copy, and that would reduce the overhead of merging so many reports together. But I couldn’t find a way of doing this with Rayon, and I guess I can’t complain since it ended up so fast anyway!

One area of problems I ran into this time was with traits. For some background, the Python code has a Report class which acts as an interface – a list of Reports is passed into the code that parses a file, and after each game a method is called so the report can accumulate whatever statistics it wants. And there’s a subclass of that called StatsReport which assumes that you’re writing the data out in a certain format to a file, so it’s even easier to write new reports.

Rust doesn’t have inheritance, but it does have traits which are kinda similar, so I optimistically made a Report trait and a StatsReport trait, and made StatsReport have a supertrait of Report, so anything that implements StatsReport also has to implement Report. It’s kinda the same thing! But unlike with real inheritance, StatsReport can’t provide implementations for methods on Report, which is kind of annoying. Not hard to work around, since you can just make the methods on the concrete struct call helper methods on StatsReport, but it does mean there’s more boilerplate needed for concrete structs.

Another problem I ran into is that writing the types for the merge_into() method on Report is hard, since ideally it would take a parameter of the same type as the concrete type. To be fair, this is tricky in a lot of languages. (although Python types are optional, so it’s easy there!) What I ended up doing was having the method take something of type Any, adding a method to every concrete implementation that did

    fn as_any_mut(&mut self) -> &mut dyn Any { self }

to convert a Report to something of type Any (??), then adding a line to the top of merge_into() like

        let other = other.downcast_mut::<Self>().unwrap();

which seems like more than should be necessary, but obviously I don’t fully understand what’s going on. (thanks Stack Overflow as usual!) I had some other problems with making Report require the Clone trait, so I gave up and added a constructor method to the Report trait.

I’m thinking about trying out Rust and WebAssembly next when I have more spare time!