This post is a part of the Statistical 52, a weekly series where I write about questions and concepts that I think aspiring statisticians should be comfortable with.
Question
What is maximum likelihood estimation, and what benefits do we get from using it?
Discussion
TLDR: Maximum likelihood estimators are consistent, have a convenient asymptotic distribution and are asymptotically efficient. These translate into efficient confidence intervals and p-values.
Data is random. When we collect data, we often assume it comes from a mathematical approximation called a statistical model. More specifically, we often make assumptions on the probability distribution that is responsible for the randomness in the data.
Continuous data are often modeled with the Normal distribution. Binary data are often modeled with the Bernoulli or Binomial distribution.
The randomness in distributions like these is decided by just a few values called parameters. Different choices of parameters will change what data we are likely and unlikely to see. They also often have intuitive interpretations.
When we collect data, we try to infer the most likely value of the parameter that could have generated the data we saw. We convert the data into an estimator, which is our guess for the parameter value.
Maximum likelihood estimation is a procedure for producing estimates for statistical models. And it produces pretty good estimates too.
We take advantage of the benefits of these estimators all the time, but it’s easy to forget what these benefits are. This was a literal interview question I had one time, and I’m lucky that I had brushed up on the topic beforehand.
The benefits of maximum likelihood estimation can be summarized in three points.
Maximum likelihood estimators (MLE) are consistent. This means that — with large amounts of data — the value of the MLE will be very close to the theoretical value of the parameter that generated the data. Assuming your model is right (huge assumption) and with enough, your estimate will give you a good guess for the parameter and for a value of interest.
Maximum likelihood estimators have an asymptotic Normal distribution. Distributions for statistics like the MLE are important for hypothesis testing. Generally, they’re hard to figure out, given how diverse data is. In this case, the process of maximum likelihood actually gives us a really convenient distribution for the MLE: a Normal one. This makes it easy to calculate p-values and confidence intervals with standard statistical programming.
Maximum likelihood estimators are asymptotically efficient. With large enough samples, MLEs will achieve the smallest variance possible by an (unbiased) estimator. Smallest possible variance means that the resulting confidence intervals we create will be as small as possible, which means that we’re more likely to get statistically significant results.
Commonly used models like GLMS (i.e. logistic regression) and mixed-effects models all incorporate use maximum likelihood, which gives us the benefits above.
Important caveat: maximum likelihood estimators are the best among unbiased estimators. There are estimators that are biased, but can achieve lower variance than the MLE.
Were you able to answer it correctly? Was there anything that surprised you? Let me know in the comments! See you in the next question.
📦 Check out my other stuff!
Read through my Statistical Garden on the Very Normal website! This digital garden houses all the knowledge I gained as a biostatistics graduate student. It’ll grow as I learn more, and it’s free for you to look through.
You can support me on Ko-fi! YouTube and Substack are the best and easiest ways to support me, but if you feel like going the extra mile, this would be the place. Always appreciated!