Question 2: Losing Efficiency

2 / 52

Aug 09, 2025

This post is a part of the Statistical 52, a weekly series where I write about questions and concepts that I think aspiring statisticians should be comfortable with.

Question

You have finished recruiting for a randomized controlled trial (RCT). Even though you planned for a 50-50 split, you have found that 60% of the sample are in the treatment group due to drop out. How much less efficient is your trial compared to a trial with a 50-50 split?

Discussion

TLDR: Deviating from a 50-50 split increases the variance in our estimator. You can measure loss of efficiency by taking a ratio of the variance you have and the ideal variance.

Sometimes experiments don’t go as planned.

Even though a 50-50 split is optimal for a two-sample experiment like a randomized clinical trial, we might end up with slightly imbalanced group sizes. With this question, I wanted to make sure my readers understood the downstream effects of this imbalance.

Here, I’m assuming that we’re using the difference in sample means to estimate the difference in population means. If it’s discernibly different from zero, it means the treatment has a causal effect on creating this difference. A huge factor in this discernibility is the variance of this statistic:

\(\begin{aligned} \text{Var}(\bar{Y}_A - \bar{Y}_B) &= \frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B} \\ &= \sigma^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right) \end{aligned}\)

Assuming that you have a fixed sample size you can work with, this variance is minimized when the sample size for both groups is equal. We can equivalently express the above variance in terms of the proportion of the sample size that’s in the treatment group.

\(\begin{aligned} \text{Var}(\bar{Y}_A - \bar{Y}_B) &= \sigma^2 \left(\frac{1}{n_A} + \frac{1}{n_B}\right) \\ &= \sigma^2 \left(\frac{1}{\pi n} + \frac{1}{(1-\pi)n}\right) \\ &= \frac{\sigma^2}{n} \left(\frac{1}{\pi} + \frac{1}{(1-\pi)}\right) \\ &= \frac{\sigma^2}{n} \left(\frac{1}{\pi(1 - \pi)} \right) \\ \end{aligned}\)

When the split is 50-50, we get the following value for the variance of the estimator:

\(\begin{aligned} \text{Var}(\bar{Y}_A - \bar{Y}_B)_{\pi = 0.5} &= \frac{\sigma^2}{n} \left(\frac{1}{0.5(1 - 0.5)} \right) \\ &= \frac{4\sigma^2}{n} \end{aligned}\)

But when the split is 60-40, we get this value instead:

\(\begin{aligned} \text{Var}(\bar{Y}_A - \bar{Y}_B)_{\pi = 0.6} &= \frac{\sigma^2}{n} \left(\frac{1}{0.6(1 - 0.6)} \right) \\ &= \left(\frac{25}{6}\right) \frac{\sigma^2}{n} \\ &\approx \left(4.17\right) \frac{\sigma^2}{n} \\ \end{aligned}\)

You can see that this slight imbalance in the treatment groups leads to a slight increase in the variance of the estimator. It may look small, but it could mean the difference between a published paper and having nothing to show for your data.

When we talk about efficiency in statistics, we are usually referring to relative variances. We want the lowest variance that we can get, and in this case, we can derive an easy expression for what that looks like. To compare variances, we take their ratio:

\(\begin{aligned} \text{Relative Efficiency} &= \frac{\text{Var}(\bar{Y}_A - \bar{Y}_B)_{\pi = 0.6} }{\text{Var}(\bar{Y}_A - \bar{Y}_B)_{\pi = 0.5} } \\ &= \frac { \left(\frac{25}{6}\right) \frac{\sigma^2}{n} } { \frac{4\sigma^2}{n}} \\ &= \frac{25}{24}\approx 104\% \end{aligned}\)

The variance under a 60-40 split is about 104% of the variance under a 50-50 split. Thus, there is about a 4% loss in efficiency due to this imbalance. We can either eat this efficiency loss or try to recruit more people to get parity with the ideal case.

Were you able to answer this question correctly? Were there concepts you didn’t know about before? Let me know in the comments!

See you in the next question.

📦 Check out my other stuff!

Read through my Statistical Garden on the Very Normal website! This digital garden houses all the knowledge I gained as a biostatistics graduate student. It’ll grow as I learn more, and it’s free for you to look through.
You can support me on Ko-fi! YouTube and Substack are the best and easiest ways to support me, but if you feel like going the extra mile, this would be the place. Always appreciated!

Mason Dowsett

Aug 16

If you decide to solve this by recruiting (/getting rid of) more people then your sample size will change too, is it useful to look at relative efficiency in this case? (I.e. the efficiency loss will be greater than 4% for a 50/50 trial where you recruit more people but will be less than 4% for a 50/50 trial where you get rid of some of the overrepresented population)

Expand full comment

Question 2: Losing Efficiency

2 / 52

Question

Discussion

📦 Check out my other stuff!

Discussion about this post