A math magician appears before you on the street in a puff of smoke, and he will make you his apprentice if you can but answer one question. You fantasize how you could make your investment portfolio grow if only you knew magic, or how you could star in your own series of blockbuster films–move aside Harry Potter! But before you get carried away, the magician posits his question to you: how many people off of the street do I need to invite to my magic show to have a 50% chance that any two of them celebrate their birthday on the same day? You sweat a little, think a little, then sweat a little more before answering: 183.
You won’t be attending Hogwarts anytime soon, my friend. You’ve been hit by, you’ve been struck by, a Birthday Paradox!
Not What It Seems
Ever have one of those kid’s magician sets, with the black top hat, black magic wand with white caps on the ends, and cape? It was never as easy to perform magic tricks for your friends as you had thought it would be looking at the box it came in, was it?
If you sample any individual off of the street, and ask them what’s their birthday (discounting the rare instances of February 29th) you would expect that any particular day of the year is equally likely. The probability of a randomly-selected person’s birthday being March 2nd should be 1/365, right? You might then expect that if you asked 365 people for their birthdays, you should certainly (close to 100%) find at least one coincidence of two people who celebrate the same birthday. So reasonably, you might guess, you’d have a 50% chance of encountering two people with a sample of only half that size. You’d be making the same mistake as the Chevalier de Mere.
What did the Chevalier de Mere Do?
The Chevalier de Mere (1607-1684) was a 17th century French writer, amateur mathematician and erstwhile fan of dice games, who is famously believed (although the story may only be apocryphal) to have made a similar mistake initially analyzing the probabilities of certain dice throws, only to have it straightened out for him by his more pious, probability-minded colleague, Blaise Pascal (1623-1663). At least you are in esteemed company.
When Intuition Fails, Do The Math
As I have mentioned in my recent series on when averages attack, often times our intuition fails us and in those times our recourse must be to look at the situation mathematically. We will rely on a few definitions from basic probability theory:
The universe under discussion, typically written as the set Ω, is a finite set of everyone we might randomly sample. We assume here that each individual has an equal chance of being randomly selected, and asked when is their birthday?
Our second set of interest is the random sample itself, containing some natural number, n, of individuals who are the elements of the set we have asked for their birthday.
An event of interest, typically written with a capital letter A (other events of interest can be written B, C, and so forth). In this case, you are interested in there being more than one person in our sample celebrating on the same birthday.
Sampling with Replacement
Suppose we have 365 people in Ω. When we randomly choose a person off of the street to ask their birthday, the probability P(Mar 2nd) that the person responds that their birthday is March 2nd is 1/365. In sampling with replacement, we return the individual to Ω and they can be chosen again (with the same chance, 1/365). How likely we are to re-select somebody we had chosen already depends upon n.
Suppose n=3, and we choose Avery (birthday October 10), Brooke (birthday March 2) and Brooke again (birthday March 2). Let’s represent this as an ordered tuple, (a,b,b). You might at first think we were extraordinarily lucky, but this isn’t the only selection that might have chosen Brooke twice. We could also have selected Brooke, Avery, and Brooke again for an ordered tuple of (b,a,b). Another possible arrangement of these 3 selections of 2 individuals is (b,b,a). We learned to count the number of possible arrangements in combinatorics, and we must compute here the number of different arrangements an outcome satisfying event A can occur rather than just the combination we think of.
Sampling without Replacement
Now suppose we have the same universe of 365 people with different birthdays. If we sample somebody from this set, then there are 364 people remaining from which to sample from next time. If we had a 1/365 probability of selecting Avery first, then the probability of selecting Brooke next is 1/364 (not 1/365 as before). If we do select Brooke, then the third person we select, Chuck, had a probability of being selected of 1/363.
By point of comparison, when sampling with replacement the probability of getting the ordered tuple (a,b,c) was 1/365 * 1/365 * 1/365 or the cardinality of our population raised to the power n, the number of samples we choose. If we sample without replacement, then the probability of getting the ordered tuple (a,b,c) is slightly greater, at 1/365 * 1/364 * 1/363. We can represent this expression generally as the product of 365 * (365 – 1) * (365 – 2 ) * … * ((365 – n) + 1) for however large n may be (up to the cardinality of Ω of course).
How Many Outcomes Satisfy Event A-Complement?
What we want to know is the probability that at least two people have the same birthday. That will be satisfied by there being 2, 3, 4, and so forth people who celebrate their birthday on the same days, and that is a more complex computation to perform. In these situations, it is better to turn the question inside-out, and ask for the complement of our desired event A. What is the probability that no one shares the same birthday? If we subtract this probability, P(A‘), which is easier to compute, from 1, then we will find the probability P(A) for which we are looking.
P(A') = 365 * (365 - 1) * (365 - 2) * ... * ((365 - n) + 1) --------------------------------------------------- 365^n P(A) = 1 - P(A')
How Many People Do We Need To Ask?
This calculation can easily be carried out with a computer, and the threshold that will give you at least a 50/50 chance of having multiple people sharing a birthday (everything from everybody you invite having the same birthday, to only 2 people out of the sampling sharing the same birthday) occurs at 23. With fewer people than you might find in a school classroom, you can expect at least half of the time to encounter at least 2 people who celebrate their birthday on the same day.
Beyond exercising the mathematical muscles in our minds to compensate for when our intuition fails us, the birthday paradox has many ramifications for our modern technology. Many cryptographic algorithms are based on perfect information hiding (that any particular byte in the output can appear with uniform probability, thereby betraying no information about the encrypted message). Cryptographic hash functions often cite the astronomical odds that any brute force attack attempting to produce a colliding digest value would ever be achieved in practice. You may hear defenses that if today’s fastest computer were attempting to find two messages with the same hash digest, they would need longer than the estimated lifetime of our universe to have a non-negligible chance of success. However, when you have large numbers of messages to work with, the probability of finding a colliding hash digest (like the probability of coincidental birthdays) goes up dramatically.
The birthday paradox haunts us all to this day, and makes events we believe are extremely rare less scarce than we might expect.