# Calculating the probability a model is broken from one bad prediction

Let’s say you have a model that gives you the probability that events will happen. All you know about the model is that it says a certain event has a one in a million chance of happening, and then that event does happen. What are the chances that the model is broken?

I had a discussion with baseball stats guru Tom Tango about this on Twitter: (see ensuing thread)

Tom’s point, which I agree with, is that models are not going to be “right” all the time. There’s only a ~3% chance of rolling double 6’s on a pair of dice, but if you pick up a pair of dice and roll double 6’s you probably don’t think that the dice are unfair! And just ask Nate Silver about his 2016 election model’s prediction that Trump only had a 30% chance of winning; just because that happened doesn’t mean that the prediction was wrong.

But, if the only thing you know about a model is that it gave an event a one in a million chance of happening, and then it happened I think you have to think it’s more likely that the model is wrong. Let’s try to do some math to figure this out.

I think the best way of doing this is using Bayesian inference. Let’s try to break this down. Say

• P(M) is the probability a model is correct
• P(E) is the probability that when the model predicted an event had a one in a million change of happening, that event happened.

So what we want to figure out P(M|E). (the probability of M given that E happened) Bayes theorem tells us that

`P(M|E) = P(E|M)*P(M)/P(E)`

and

• P(E|M) = the probability that E happened given that the model was correct, which is one in a million.
• P(M) = this is the prior probability that the model is correct. This depends on what you know about the model, but let’s say you wrote the model yourself and you’re 99% sure there are no bugs 🙂
• P(E) = ummm…this seems hard to evaluate. Let’s try to break it down; either the model is correct or it isn’t, so

`P(E) = P(E|M)*P(M) + P(E|~M)*P(~M)`

We already know P(E|M) and P(M) from above, and P(~M) is 1-P(M)=.01. But what is P(E|~M)? If the model is wrong, what’s the probability that the “one in a million” thing happened? This seems to require knowing how likely the event really is to happen, and if we knew that we wouldn’t need a model! I guess we can use an extremely naive estimate – either the event happens or it doesn’t, so P(E|~M) = 0.5. (Edit: on Facebook, Gary pointed out that one way to handle this is to define what’s “sufficiently wrong”, since if the real probability is 1/999,999 we probably wouldn’t call the model incorrect. Then you can use that probability, for example 1/500,000 here, which makes a lot of sense to me!) I am skeptical this is the right way to do it, but, this makes

`P(E) = 0.000001 * 0.99 + 0.5 * 0.01 = 0.00500099`

and

`P(M|E) = 0.00000099/0.00500099 = 0.000198`

or .01%, so there’s very little chance the model is right.

A few odds and ends:

• I tried reading more about Bayesian inference to figure out what to do about P(E) but didn’t find anything helpful. If anyone knows, please comment below!
• I think the general lesson is that you want your model to make lots of predictions to see if it’s calibrated well. If your model predicts things that are more likely than 50% to happen and is right, you can do the same sort of calculation here to get more confident it’s correct, and build up a buffer against very wrong predictions like this.
• But probably the best way to do this is to do what 538 does, make lots and lots of predictions, and analyze them to see if they’re well-calibrated. Of course, to do this for events that have probabilities like one in a million, you’d have to make at least a million predictions, which is tough.
• I think this also drives home that a one in a million thing happening is very very very rare, and we shouldn’t underestimate that. Just as a random reference, perfect games in baseball are very rare and they seem to have about a 1 in 10000 chance of happening – 100 times more likely than one in a million!