This post is a part of the Statistical 52, a weekly series where I write about questions and concepts that I think aspiring statisticians should be comfortable with.
Question
You have finished recruiting for a randomized controlled trial (RCT). Even though you planned for a 50-50 split, you have found that 60% of the sample are in the treatment group due to drop out. How much less efficient is your trial compared to a trial with a 50-50 split?
Discussion
TLDR: Deviating from a 50-50 split increases the variance in our estimator. You can measure loss of efficiency by taking a ratio of the variance you have and the ideal variance.
Sometimes experiments don’t go as planned.
Even though a 50-50 split is optimal for a two-sample experiment like a randomized clinical trial, we might end up with slightly imbalanced group sizes. With this question, I wanted to make sure my readers understood the downstream effects of this imbalance.
Here, I’m assuming that we’re using the difference in sample means to estimate the difference in population means. If it’s discernibly different from zero, it means the treatment has a causal effect on creating this difference. A huge factor in this discernibility is the variance of this statistic:
Assuming that you have a fixed sample size you can work with, this variance is minimized when the sample size for both groups is equal. We can equivalently express the above variance in terms of the proportion of the sample size that’s in the treatment group.
When the split is 50-50, we get the following value for the variance of the estimator:
But when the split is 60-40, we get this value instead:
You can see that this slight imbalance in the treatment groups leads to a slight increase in the variance of the estimator. It may look small, but it could mean the difference between a published paper and having nothing to show for your data.
When we talk about efficiency in statistics, we are usually referring to relative variances. We want the lowest variance that we can get, and in this case, we can derive an easy expression for what that looks like. To compare variances, we take their ratio:
The variance under a 60-40 split is about 104% of the variance under a 50-50 split. Thus, there is about a 4% loss in efficiency due to this imbalance. We can either eat this efficiency loss or try to recruit more people to get parity with the ideal case.
Were you able to answer this question correctly? Were there concepts you didn’t know about before? Let me know in the comments!
See you in the next question.
📦 Check out my other stuff!
Read through my Statistical Garden on the Very Normal website! This digital garden houses all the knowledge I gained as a biostatistics graduate student. It’ll grow as I learn more, and it’s free for you to look through.
You can support me on Ko-fi! YouTube and Substack are the best and easiest ways to support me, but if you feel like going the extra mile, this would be the place. Always appreciated!