parsing baseball files in Rust instead of Python for an 8x speedup!

linked to by This Week in Rust! also see Hacker News discussion here.

Since I’ve been quickly becoming a Rustacean (Rust enthusiast) – see previous posts here – I decided to take a crack at the parser for my baseball win expectancy finder. It’s written in Python, and when I made it parse files in parallel (previous writeup) it sped up processing time by ~4x. I thought doing it in Rust would run even faster, although a lot of the work is in regular expressions which I didn’t think would show a big difference between the two languages.

Here’s the PR for the change, and I guess less time is spent in regular expression than I thought, because here’s a table of the time it takes to parse ~130,000 baseball games (all MLB games from 1957-2019):

Time (in seconds)
Python single core implementation165
Python multicore implementation35
Rust single core implementation (commit e76de817)20
Rust multicore implementation4 (!)
Time to parse 130,000 baseball games on my desktop i5-8600K CPU (which has 6 cores)

So that’s about an 8x speedup in both single core and multicore code! And the final product runs in between 4-5 seconds, which is just unbelievably fast.

As always, I learned a lot about Rust along the way:

  • Rust makes me want to write more performant code. This shouldn’t be a surprise, because performance is one of Rust’s raison d’Γͺtre’s (it’s first on the “Why Rust?” list at rust-lang.org!), but every time I have to call collect() or clone(), I think about if I really need to do it. It’s much different than when I’m writing code in Python!
  • That being said, I didn’t do anything major algorithmically to speed things up – the biggest thing was probably keeping the mapping of which runners end up at which base (RunnerDests in the Rust code) in an array instead of a HashMap. But I’m sure that I avoided a ton of copies/allocations along the way.
  • One of the nicest little ways to speed things up is the entry() method on HashMap. If the passed-in key is present in the map it will return a reference to the value, otherwise you can call something like or_insert() to insert a value. This is nice because you only have to do the lookup one time!
  • Still a huge fan of the ? operator which makes it very simple to propagate errors up. This time I used the anyhow crate to make it easy to return String errors, although I didn’t really take advantage of its ability to attach context to errors as I got a bit lazy. Maybe next time!
  • The Rust single core implementation in debug mode took 673 seconds, which is 33x slower than in release mode. This is my usual reminder to never benchmark anything in debug mode!
  • The nice thing about this project is that, other than writing tests for parsing tricky plays, you can just run the parser on all the games and see if the resulting stats files match. After implementing the report for win expectancy including the balls/strikes count, I was dismayed to see the stats files was wrong – there was one game somewhere (out of 130,000!) that wasn’t using the balls/strikes count correctly. Of course, I could have narrowed it down by only running the parser on certain years and going from there, but luckily after looking at the Python code more closely I realized that it handled cases where the pitches were lowercase, and the Rust code did not, which was easy to fix. I guess there’s one plate appearance somewhere that has lowercase pitches!
  • I did try some optimizations by using SmallVec (see commit dd6ed7a) to store small vectors on the stack instead of using heap allocations. It did seem to help a little bit – the single core runtime went from 20 seconds to 19 seconds, although I’m not 100% sure that’s significant. I also used smol_str (see commit 48b47dfe) to do the same thing for strings after verifying that most of the strings in files were 22 characters or less, although again it didn’t show much/any improvement.
  • I also went ahead and rewrote the script that the web app calls to look up the data in Rust. I’m still clearly slower at writing Rust code than Python code – it took me a little over an hour when I already had a working Python script to look at. I assume it also runs faster than the Python one but they’re both fast enough so I didn’t bother to benchmark it.
  • Like with the clue solver and population centers projects, I used Rayon for the multicore implementation, which worked pretty well. One complaint I have is that I had to create a new copy of each report for each file we process and then merge them all together. Ideally I would just create one copy per thread since each thread can safely update its own copy, and that would reduce the overhead of merging so many reports together. But I couldn’t find a way of doing this with Rayon, and I guess I can’t complain since it ended up so fast anyway!

One area of problems I ran into this time was with traits. For some background, the Python code has a Report class which acts as an interface – a list of Reports is passed into the code that parses a file, and after each game a method is called so the report can accumulate whatever statistics it wants. And there’s a subclass of that called StatsReport which assumes that you’re writing the data out in a certain format to a file, so it’s even easier to write new reports.

Rust doesn’t have inheritance, but it does have traits which are kinda similar, so I optimistically made a Report trait and a StatsReport trait, and made StatsReport have a supertrait of Report, so anything that implements StatsReport also has to implement Report. It’s kinda the same thing! But unlike with real inheritance, StatsReport can’t provide implementations for methods on Report, which is kind of annoying. Not hard to work around, since you can just make the methods on the concrete struct call helper methods on StatsReport, but it does mean there’s more boilerplate needed for concrete structs.

Another problem I ran into is that writing the types for the merge_into() method on Report is hard, since ideally it would take a parameter of the same type as the concrete type. To be fair, this is tricky in a lot of languages. (although Python types are optional, so it’s easy there!) What I ended up doing was having the method take something of type Any, adding a method to every concrete implementation that did

    fn as_any_mut(&mut self) -> &mut dyn Any { self }

to convert a Report to something of type Any (??), then adding a line to the top of merge_into() like

        let other = other.downcast_mut::<Self>().unwrap();

which seems like more than should be necessary, but obviously I don’t fully understand what’s going on. (thanks Stack Overflow as usual!) I had some other problems with making Report require the Clone trait, so I gave up and added a constructor method to the Report trait.

I’m thinking about trying out Rust and WebAssembly next when I have more spare time!

13 thoughts on “parsing baseball files in Rust instead of Python for an 8x speedup!”

  1. One way to solve the `Report` and `StatReport` problem is to reverse it: instead of having `StatReport` demand `Report`, it implements it. So then you have:

    β€œ`rust
    trait Report {
    fn different_for_each_type(..);
    fn same_for_many_types(..);
    }
    trait StatReport {
    fn different_for_each_type_impl(..);
    fn another_method(..);
    }
    impl Report for T {
    pub fn different_for_each_type(..) {
    different_for_each_type_impl(..)
    }

    pub fn same_for_many_types(..) { .. }
    }
    β€œ`

    Liked by 1 person

    1. Oh, that’s clever! I guess the downside is that the StatsReport trait then needs another version of every method that different for each type, but maybe that’s just because I’m used to thinking of things from an object-oriented language perspective. I’m going to try this when I have a chance, thanks!

      (and if this is the Amos who is @fasterthanlime, I love your writing; it’s one of the things that got me excited about Rust!)

      Like

      1. About the repeated methods: you can also have a third trait, containing those methods, for which the two others are supertraits. Then it would look like this:


        trait ReportCore {
        fn different_for_each_type(..);
        }
        trait Report : ReportCore {
        fn same_for_many_types(..);
        }
        trait StatReport : ReportCore {
        fn another_method(..);
        }
        impl<T: StatReport> Report for T {
        pub fn same_for_many_types(..) { .. }
        }

        view raw

        playground.rs

        hosted with ❤ by GitHub

        I’m not sure if it’s worth the extra complexity, but it is with less repetition.

        Like

      2. Hah, I guess the Rust community is big enough for two Amos’s now πŸ™‚

        This is interesting, I’ll have to try it. Thanks – I’m learning a lot!

        Like

  2. Apparently WP doesn’t support markdown πŸ˜”(hopefully it supports emojis), and swallows > < (“html tags”)
    The impl line should be
    impl > T: StatReport < Report for T

    Liked by 1 person

  3. Regarding the `merge_into` and `Clone` issues, they both stem from the fact that you’re doing dynamic dispatch. This limits your methods to object-safe ones, which means that `Self` can’t appear except as the receiver – which isn’t true for `clone` or for your “ideal” `merge_into` (which would look something like `fn merge_into(&self, other: &mut Self)` ). The question is, do you really need dynamic dispatch? Looking at your main, it seems you’re choosing a single report for most options, and two for one of them. If you only choose single options, then you can extract most of your main to a function which is generic over the specific report type, avoiding both dynamic dispatch problems. If you really need the double report, you can do something like


    struct<R1, R2> DoubleReport(R1, R2);
    impl<R1: Report, R2: Report> Report for DoubleReport<R1, R2> {
    fn report_method(&self, ..) {
    self.0.report_method(..);
    self.1.report_method(..);
    }
    }

    view raw

    playground.rs

    hosted with ❤ by GitHub

    This isn’t very scalable to multiple reports, but might be the right solution here.

    Like

  4. Just a small heads up, in the Cargo.toml file in the github repository, optimizations are enabled. You might be able to speed up the code by adding the following:

    [profile.release]
    opt-level = 3

    Liked by 1 person

Leave a comment