Very recently, I got caught up in a Youtube vortex: a long series of videos on a related topic.I couldn’t sleep and suddenly I found that a riddle video was recommended to me. It was one of those riddles that you might expect in a software engineering interview at Google or Amazon. Seeing as I had nothing better to do, I watched a bunch of these and got inspired to write an article on it. While these toy riddles themselves don’t really have to do anything with biostatistics, it brings to mind a valuable skill that I feel is under-discussed: simulation.
The word simulation brings up images of virtual reality and that one episode of Rick and Morty:
The reality is much more mundane than that. A simulation in a statistical sense is really just the use of programming to see what would happen in particular scenario. Back when I wrote up about the Central Limit Theorem, I actually drafted up some code to produce a GIF on how the normal distribution arises purely from taking samples and calculating their averages.
In this case, this simulation really boils down to the following:
Generate a dataset with some specific, known statistical qualities. With the GIF above, I know what the underlying population is and therefore what the underlying population mean is.
Perform a statistical procedure on the data. This range from simple (like taking an average) or more complex (like running some type of regression) on it.
Repeat 1 and 2 many, many times and look at the results
I like simulations because they’re a really great gut check to all of the theory that is learned in class. It’s one thing to be taught that the Central Limit Theorem will give us a normal distribution as we get infinite amounts of data, but it’s completely different to program a mundane calculation and see the result happen. Lots of statistical results rely on this idea of having infinite amounts of data, but infinity is a difficult concept to really handle by itself. Simulations give us a way to “approximate” infinity by letting us substitute a large number and seeing what happens.
This issue is dedicated to another simulation. This time, our simulation will act as a solution to one of the riddles that I saw in my video marathon. I love solving riddles, but I’m not very good at it, so simulations help me a lot.
The riddle
Imagine there are 100 people in a large circle. They are playing what I call the Game of Slaps. Each person has to decide whether or not they will slap the person to their right or their left; they cannot slap themselves. There’s no real reason for a person to choose one direction over the other. At the same time, everyone will slap in their chosen direction.
Now the question is: what is the average amount of people who will not be slapped? We say average here because the choice of direction is random, so this number will not be fixed if we repeat the game.
A simulation, a solution and an explanation
Now, there’s certainly a probability-based approach that you can take, and it will definitely give you the right answer. That’s boring though, and it doesn’t give us a chance to simulate what might happen. We’ll consult the general outline I laid out above:
Generate a dataset with some specific, known statistical qualities. With the GIF above, I know what the underlying population is and therefore what the underlying population mean is.
Perform a statistical procedure on the data. This range from simple (like taking an average) or more complex (like running some type of regression) on it.
Repeat 1 and 2 many, many times and look at the results
Generating the dataset
For this, I need to come up with a dataset that can accurately represent a circle of 100 people. On top of that, I need to program a way for each of these people to “choose” a direction to slap.
I’ll do this with a vector, which is essentially just a long list of numbers, to represent the circle. To emulate a “random choice", I can use a random number generator that is readily available in R. If I wanted to program a single coin flip of a fair coin, I can do the following:
rbinom(1, size = 1, prob = 0.5)
> 1
A coin flip only has two equally probable faces — heads or tails. This “equally probable” aspect is captured by prob = 0.5
. We’re only flipping a single coin, so this is what size = 1
is doing. The first 1 represents how many single coin flips I want to do.
What I’m doing here is essentially using these “coin flips” to decide what direction a person will slap. Every time you have something akin to a “yes/no” decision, you can model it as a coin flip. 100 people in riddle means 100 coin flips in code.
circle = rbinom(100, size = 1, prob = 0.5)
> 0 0 1 1 1 0 1 0 0 1 …
Now that I have a bunch of coin flips, I just need to code them as either “left” or “right”. This is mostly arbitrary, but I need to keep track of my choice once I start counting the number of people who don’t get slapped.
circle = if_else(circle == 1, "R", "L")
> "L" "L" "R" "R" "R" "L" "R" "L" "L" "R"
Do the statistics
To understand how many people will not get slapped, it’s important to understand what happens with a single person! I won’t show code here, but consider a situation where you’re just looking at a single person between two other people. What would be the different situations that they would get slapped?
To cut to the chase, a person doesn’t get slapped if 2 things happen at the same time:
The person to their left chooses to slap left
The person to their right chooses to slap right
As we go through the vector of L’s and R’s, we can examine what the choices of the people before and after them. With some programming logic, we can count how many times the above situation happens.
After doing this once, we will have simulated a single Game of Slaps! To understand the average number of people who get slapped, we just need to simulate multiple games! Thanks to the power of programming, we can just run our code multiple times instead of having to ask real people to slap each other.
Central Limit Theorem strikes again
Below, I show the results for 1000 simulated games of Game of Slaps, where there are 100 people in each game:
Based on 1000 simulations, the average number of people that don’t get slapped is about 25 people. As we will see soon, this squares up with what we would expect in theory. I’d just like to point out that even though we’re just talking about a dumb game from a video I saw online, we can still see that this number of people slapped starts to gain a bell shape.
This is the Central Limit Theorem in action! Because the Game of Slaps is a random game, we know theoretically that an entire game has some sort of probability distribution associated with it, where we can calculate a probability to any number of times of people not being slapped. We could theoretically try to figure out and calculate this distribution, but in practice, this is unfeasible. What the Central Limit Theorem guarantees us is that if we want to know about the average of this distribution, we can alternatively just simulate a lot of games!
The power of the Central Limit Theorem is that it does not matter what the underlying population distribution is. As long as we’re looking at a sample average, we will get a normal distribution in the end AND that the population average will be at the very center of this bell.
To simulate this game, I just had to convert the rules into code. I didn’t have to understand the deeper probability rules of the game. But for the doubters out there…
The deeper probability of the game
Astute eyes will notice that 25 is essentially just a quarter of 100 people. Is this value special in anyway? It comes as a consequence of the possible results for a single person.
Ultimately, there are only 4 different scenarios that can happen. In 3 of these scenarios, the middle person gets slapped at least once, which instantly disqualifies them.
Therefore, if we were' to assemble a large amount of people, we would expect a quarter of them to not be slapped. This is exactly what we see with the simulations.
Keep in mind that the simulations don’t know this at all. They only keep track of what happens to a single person. It’s in the aggregate that we start seeing the results of these simulations starting to tell us about population-level information.
Last thoughts
Admittedly, after writing the code and the article itself, I felt that the riddle was a little too easy to really merit any discussion. However, I still feel it’s at least a little valuable as some insight into a skill that many statisticians must master.
Theoretically, with some ingenuity and a lot of time, any game can be properly simulated with some code. It’s been done with Monopoly. With more complex games like this, it’s much, much, much more difficult to just go for the jugular and try to figure out things purely from a probability perspective. In cases like these, simulations reign supreme and will undoubtedly help you answer any questions you might have.
Biostatisticians in particular usually must use simulations to figure out how many people should be included in a clinical trial so that money, time and manpower is saved.
See you next week!
Create your profile
Only paid subscribers can comment on this post
Check your email
For your security, we need to re-authenticate you.
Click the link we sent to , or click here to sign in.