Measures of central tendency can be thought of as a tool box. The mean serves a purpose as our first, best guess of what an average value for a variable looks like. You sum the observations and divide by the number of those observations. However, if you have very high or very low values then they can throw off the mean to something extreme. The median ignores those pesky outliers to give a sense of what is realistically possible. You simply order your observations and either pick the one in the middle or the average of the two that are in the middle. As a disadvantage, this may be too rigid and not really give you a sense of what kind of distribution you are dealing with, unless it equals or is close to equaling the mean. The mode lets us know if there are consistent patterns. In the end, however, frequencies and simple counts are not very satisfying or informative. Can we do better? Is it possible to blend these to make yet another useful tool?
The answer is “yes” and it comes in the form of the trimmed mean. The trimmed mean is a technique of averaging that removes a small percentage of the largest and smallest values before calculating the mean.
Data trimming is applied to data sets when outliers are an obvious issue or a normal distribution cannot be assumed. Outliers, simply explained, are extreme values that disrupt distributions in a data set. If I want a typical house price in a neighborhood, I could use the mean. However, one very high or very low price will throw off that mean to a degree that is unhelpful. Real estate companies, such as Zillow, very often use medians instead but a trimmed mean can serve the same purpose. Incidentally, cutting outliers or even just very unusual observations can be useful for the mean but not for the median. The median is already smack dab in the middle of your data set and doesn’t really suffer from a pull towards one tail or the other. It is effectively a trimmed mean of its own, and is certainly acceptable in many situations.
How to Find a Trimmed Mean in R
You can easily use a trimmed mean within R. There is a “trim” argument built right into the mean function. A ten percent trimmed mean represents the mean of a dataset after the ten percent smallest and greatest values have been eliminated.