Member-only story
Maximum Likelihood Estimation is Probably for the Best (Fit).
Imagine we observe a hundred coin flips, and we get heads twenty times and tails eighty times. If the coin is not altered in some way then an assumption might be that the probability of getting heads is still 0.5.
This may not be wise, because if that was the case we should have observed heads fifty times and tails fifty times. That did not happen. Perhaps a more reasonable assumption would have been a probability of 0.2 for heads (and thus a 0.8 for tails).
Given only the data you have, the principle of maximum likelihood establishes that we can formulate a model and change its parameters to maximize the probability (likelihood) of having observed what we did observe. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. You are going to build a model that explains most of the observed points and in the process you describe what distribution those points take. Modern software makes this rather easy.
An added bonus is that maximum likelihood allows us to calculate the standard error precision of each estimated coefficient of a model with relative ease. These coefficients are obtained by calculating the curvature of the log-likelihood with respect to each parameter. This is obtained by finding the second order derivatives of the…