Airport Guides 2.0 is now available on Android!

Airport Guides 2.0 is now available on Android – it’s free, check it out!

When I stop and think about it, I’m not sure why I keep this project up. I published the first Xamarin.Forms Android version 5 years ago, and since then the Windows development experience for “modern” apps has gone through a lot of changes. Instead of trying to patch things forward I decided to make a fresh start with the UI code. (although things are still similar enough that it wasn’t nearly as hard as rewriting it)

And I think that was the right decision, but it’s irritating that the Portable Class Library I used isn’t really a thing anymore, and I already know what we’re using now is going to change when MAUI gets released. The Android emulator I’m using is slow (although not nearly as bad as it used to be, seems around 3x slower than a phone) and some things on it just don’t work. XAML Hot Reload is very nice, but sometimes the XAML changes don’t get recompiled or something, so every time I try something and it doesn’t work I have to guess whether I think I wrote the wrong code, or whether I wrote the right code and I should just rebuild the app. And I didn’t remember this from before, but you can get some pretty cryptic errors for common things:

  • If you use the wrong x:DataType on an element you get an InvalidCastException at runtime (fine) with no information about what you were trying to cast from and to! (not fine!)
  • Similarly a NullReferenceException I caused got converted to a TargetInvocationException with no information (because it was during navigation, I guess?)
  • I wasted a solid half hour on a cryptic XAML error “Content is set more than once” before desperately Binging and realizing that’s what you get when you have an extra “>” at the end of a XAML tag. *angry face*

It also has the irritating property that I need to periodically go through and update the maps for all 90 airports, which also has the “every day this app gets a little more out of date” feeling. And it’s not a big moneymaker, not that any of my apps are; we’re talking less than $50 a year.


And yet I poured a bunch of time over the last few weeks into this partial rewrite. I think there are a few reasons I’m drawn to the app:

  • Despite my complaints above, being able to write C# code (that I could conceivably port to iOS one day) for an Android app is pretty neat!
  • I really do like traveling, and airports, and maps, and looking at airport maps makes me happy.
  • I feel a sense of pride having apps that I wrote (I use my Baseball Odds app all the time!), and I’m looking forward to using this again when we travel.
  • I have a vague sense that having apps that are publicly available is good for my…brand? image? I dunno.

Anyway, it’s done now so I guess I should stop dithering. Give it a shot if you have an Android phone!

parsing baseball files in Rust instead of Python for an 8x speedup!

linked to by This Week in Rust! also see Hacker News discussion here.

Since I’ve been quickly becoming a Rustacean (Rust enthusiast) – see previous posts here – I decided to take a crack at the parser for my baseball win expectancy finder. It’s written in Python, and when I made it parse files in parallel (previous writeup) it sped up processing time by ~4x. I thought doing it in Rust would run even faster, although a lot of the work is in regular expressions which I didn’t think would show a big difference between the two languages.

Here’s the PR for the change, and I guess less time is spent in regular expression than I thought, because here’s a table of the time it takes to parse ~130,000 baseball games (all MLB games from 1957-2019):

Time (in seconds)
Python single core implementation165
Python multicore implementation35
Rust single core implementation (commit e76de817)20
Rust multicore implementation4 (!)
Time to parse 130,000 baseball games on my desktop i5-8600K CPU (which has 6 cores)

So that’s about an 8x speedup in both single core and multicore code! And the final product runs in between 4-5 seconds, which is just unbelievably fast.

As always, I learned a lot about Rust along the way:

  • Rust makes me want to write more performant code. This shouldn’t be a surprise, because performance is one of Rust’s raison d’รชtre’s (it’s first on the “Why Rust?” list at rust-lang.org!), but every time I have to call collect() or clone(), I think about if I really need to do it. It’s much different than when I’m writing code in Python!
  • That being said, I didn’t do anything major algorithmically to speed things up – the biggest thing was probably keeping the mapping of which runners end up at which base (RunnerDests in the Rust code) in an array instead of a HashMap. But I’m sure that I avoided a ton of copies/allocations along the way.
  • One of the nicest little ways to speed things up is the entry() method on HashMap. If the passed-in key is present in the map it will return a reference to the value, otherwise you can call something like or_insert() to insert a value. This is nice because you only have to do the lookup one time!
  • Still a huge fan of the ? operator which makes it very simple to propagate errors up. This time I used the anyhow crate to make it easy to return String errors, although I didn’t really take advantage of its ability to attach context to errors as I got a bit lazy. Maybe next time!
  • The Rust single core implementation in debug mode took 673 seconds, which is 33x slower than in release mode. This is my usual reminder to never benchmark anything in debug mode!
  • The nice thing about this project is that, other than writing tests for parsing tricky plays, you can just run the parser on all the games and see if the resulting stats files match. After implementing the report for win expectancy including the balls/strikes count, I was dismayed to see the stats files was wrong – there was one game somewhere (out of 130,000!) that wasn’t using the balls/strikes count correctly. Of course, I could have narrowed it down by only running the parser on certain years and going from there, but luckily after looking at the Python code more closely I realized that it handled cases where the pitches were lowercase, and the Rust code did not, which was easy to fix. I guess there’s one plate appearance somewhere that has lowercase pitches!
  • I did try some optimizations by using SmallVec (see commit dd6ed7a) to store small vectors on the stack instead of using heap allocations. It did seem to help a little bit – the single core runtime went from 20 seconds to 19 seconds, although I’m not 100% sure that’s significant. I also used smol_str (see commit 48b47dfe) to do the same thing for strings after verifying that most of the strings in files were 22 characters or less, although again it didn’t show much/any improvement.
  • I also went ahead and rewrote the script that the web app calls to look up the data in Rust. I’m still clearly slower at writing Rust code than Python code – it took me a little over an hour when I already had a working Python script to look at. I assume it also runs faster than the Python one but they’re both fast enough so I didn’t bother to benchmark it.
  • Like with the clue solver and population centers projects, I used Rayon for the multicore implementation, which worked pretty well. One complaint I have is that I had to create a new copy of each report for each file we process and then merge them all together. Ideally I would just create one copy per thread since each thread can safely update its own copy, and that would reduce the overhead of merging so many reports together. But I couldn’t find a way of doing this with Rayon, and I guess I can’t complain since it ended up so fast anyway!

One area of problems I ran into this time was with traits. For some background, the Python code has a Report class which acts as an interface – a list of Reports is passed into the code that parses a file, and after each game a method is called so the report can accumulate whatever statistics it wants. And there’s a subclass of that called StatsReport which assumes that you’re writing the data out in a certain format to a file, so it’s even easier to write new reports.

Rust doesn’t have inheritance, but it does have traits which are kinda similar, so I optimistically made a Report trait and a StatsReport trait, and made StatsReport have a supertrait of Report, so anything that implements StatsReport also has to implement Report. It’s kinda the same thing! But unlike with real inheritance, StatsReport can’t provide implementations for methods on Report, which is kind of annoying. Not hard to work around, since you can just make the methods on the concrete struct call helper methods on StatsReport, but it does mean there’s more boilerplate needed for concrete structs.

Another problem I ran into is that writing the types for the merge_into() method on Report is hard, since ideally it would take a parameter of the same type as the concrete type. To be fair, this is tricky in a lot of languages. (although Python types are optional, so it’s easy there!) What I ended up doing was having the method take something of type Any, adding a method to every concrete implementation that did

    fn as_any_mut(&mut self) -> &mut dyn Any { self }

to convert a Report to something of type Any (??), then adding a line to the top of merge_into() like

        let other = other.downcast_mut::<Self>().unwrap();

which seems like more than should be necessary, but obviously I don’t fully understand what’s going on. (thanks Stack Overflow as usual!) I had some other problems with making Report require the Clone trait, so I gave up and added a constructor method to the Report trait.

I’m thinking about trying out Rust and WebAssembly next when I have more spare time!

clue solver now in Rust with more accurate simulations!

linked to by This Week in Rust!

Here’s a new version of my clue solver with the engine rewritten in Rust and more accurate simulations!

I wrote the original Python engine in 2007 and haven’t really touched it since then (other than to upgrade it to Python 3), so rewriting it in Rust was an interesting experience. Here are some categorized thoughts about Rust, software development, and simulations/math:

Rust

Rust is my new favorite language. (you can read my initial thoughts about it here, and all my Rust posts here) Rewriting the engine in Rust wasn’t particularly tricky, with a few exceptions I’ll talk about below, and it sped up simulations by over 4x without any real effort! (on my desktop machine a simulation went from taking ~2.2 seconds to ~0.5 seconds, although I made some changes after that I’ll talk about in the Simulations section)

  • I’m really digging the feel of Rust; it’s kind of a low-level language (you have very fine control of where copies happen) with the convenience of lots of helpful libraries and nice language features.
  • Having to declare which variables/references are mutable versus not is not only helpful for performance/safety, it also makes for nice documentation on methods. I noticed this because more than one place in the Python code I had written comments saying I had to make a copy to avoid modifying the original. This is similar to const-correctness in C++ which is also nice but less enforced by the language.
  • I declared an enum for all the possible cards. (which is safer than the strings I used in Python; I’m eternally afraid of typos!) But in Rust, an enum is more powerful than a C/C#/Java enum; it can have arbitrary data associated with it. This is a handy feature, but it means that the language is careful about the difference between an enum value and a reference to an enum value when in my case it doesn’t matter, since they’re the same size. I was constantly using card when I really wanted &card or *card. Luckily the compiler error messages are incredibly helpful so it was always clear what I needed to use, but it was still annoying.
  • The ? operator makes handling errors so easy! Part of this project was also rewriting the CGI script in Rust, which meant handling all sorts of errors if the right query parameters aren’t specified and whatnot. But ? makes easy to early return if a suboperation fails and made the CGI code much nicer!
  • Using Visual Studio Code for Rust development using the rust-analyzer extension was pretty nice – it annotates variables with types and method parameters with their names! The only downside was that sometimes when I pasted some code in it would freeze for a few seconds, although this didn’t seem to happen all the time.
  • One note – while porting the simulation algorithm to Rust increased its speed by over 4x, this is for the release version. The debug version in Rust was actually 3x slower than the Python version! So just a reminder to always benchmark in release mode ๐Ÿ™‚
  • Rust also does backwards type inference, which is pretty cool. So in a method you can declare a variable with let mut some_variable; then do some stuff with it, then return it, and Rust will figure out it must be the same type as the return type of the method. (and with rust-analyzer it will show the type in the IDE!)
  • I did run into some Rust-specific hurdles when I was declaring my data structures. In Python the ClueEngine class has a list of CluePlayer‘s, and each player has a reference back to the engine. In Rust this is tricky to do because of lifetime management and…stuff. I read some articles about how graphs are usually declared in Rust (a similar problem) and tried to copy that but it never quite worked. At one point my Rust ClueEngine had a RefCell<Vec<Rc<RefCell<PlayerData>>>> which I’m guessing is wrong ๐Ÿ™‚ Anyway, even if it had worked using Rc and RefCell means that you lose the safety of compile-time checking and get run-time checking instead, which is scary. It turned out I just had to restructure things a little bit to avoid needing a reference from CluePlayer to ClueEngine and it was well worth it.
  • After I fixed the simulation problem I was able to parallelize it using Rayon (like I did for the population centers project) pretty easily! It resulted in about a 5x speedup from the non-parallel version.

General

One big difference I noticed from the Python code I wrote in 2007 was that I was much more eager to add unit-type tests. I had an OK set of integration tests covering some bugs I had had to fix in Python, but I expanded that a bunch. Test counts aren’t everything, but I had 20 tests in Python and I have 65 engine tests in Rust, plus 40 tests for the CGI server in Rust. (I didn’t have any CGI tests before) These tests gave me a lot of confidence to make performance improvements in Rust, as well as tweak the simulation algorithm.

Simulations

I had suspicions that simulations were wrong before but wasn’t sure what exactly was wrong. Let me walk through a simple example of what the old algorithm said and how I fixed it.

Let’s say you’re at the very beginning of a 6-player game; you’re Player 2. You don’t have any suspect cards, so as far as you’re concerned each suspect has a 16% chance of being the solution. Player 1 makes a suggestion (ProfessorPlum, Knife, Hall) and it goes by all the way to Player 6, who shows a card, but you don’t see what it is.

The key question here is: according to you, has the chance that ProfessorPlum is the solution gone up or stayed the same?

This sounds a lot like the Monty Hall Problem, but it’s not the same because there’s no host manipulating things. The answer that after talking to David, we’re pretty convinced about is that the chance that Player 6 has ProfessorPlum goes up significantly, and the chance that ProfessorPlum is the solution goes up but not as much.

The old algorithm did not show the solution chance going up at all. Here’s roughly how the old algorithm worked:

  • Look at what cards might still be the solution (so omit cards that we know a player has)
  • For each solution combination (Suspect, Weapon, Room):
    • Update the engine’s state to say this combination is the solution.
    • Run N simulations. For each simulation:
      • For each player, randomly assign her cards from the cards we don’t know who has, but skip ones we know she doesn’t have.
      • After this is all done, see if the engine is consistent. If so, increment the count for the current configuration.

The bold part was wrong. I added that to try to reduce the number of inconsistent simulations, but it biases the results. In the situation we constructed above, it means that every simulation will be consistent, and since we weight all solutions equally, the solutions will all have equal probability. Perhaps there’s some way to do a Markov Chain Monte Carlo-like thing where we could ensure the solution is valid but weight it appropriately, but that would take more research.

The fix was pretty simple – just take the bold part out and assign the cards blindly. And that’s pretty much it! Because we’re throwing out more inconsistent simulations now, I increased the number we run from 2,000 to 20,000, but then got most of that performance back by parallelizing the algorithm.

There was another small fix when I realized that we have to extend the “don’t try to be too clever and avoid inconsistent states” even back to when we assign the solution – otherwise if we know either Player 6 or the Solution has ProfessorPlum, once we pick a non-ProfessorPlum solution we will know Player 6 has ProfessorPlum, and we shouldn’t start with that knowledge when assigning cards.

I am a little worried that even 20,000 simulations won’t be enough for more complicated game states, so I might play around with that some more.

Anyway, probability is hard!

Name my color – find friendly names for a given color!

Check it out! You can enter a color by RGB value or use a color picker, and it will show you a list of names of similar colors.

This was inspired on a whim by a work thing. There’s honestly not a lot interesting about the project, other than it kinda drove home what a magical time we live in (programmingwise, anyway…) that I could do a quick search and npm install libraries to convert from RGB to CIELab, compute the distance between two CIELab colors, and even a nice color picker. And using them was very straightforward!

I even wrote a small TypeScript declaration file for one of the libraries that didn’t have types. All in all, a good quick project!