The Law of Averages: AVERAGE, MEAN, MEDIAN, and MODE

When you want to make sense of a lot of numbers, one of the ways to do it is to get an average (also called the mean). To calculate an average, add all the numbers and then divide the total by the number of numbers. If you added five numbers, you divide the sum by 5. You needn’t do it by hand; Excel provides the AVERAGE function.

The average would give you an idea of where the centre of the numbers is. Of course, it says nothing about how widely the numbers spread about the mean, nor their distribution. For those, there are other functions.

Another way to look at the centre of the data is to find the number located in the middle of the sorted data.  This we call the MEDIAN.  The median has an equal quantity of numbers smaller, and larger, than itself.  Count the numbers in the set of data.  If the count is odd, the median is the middle number in the sorted data.  If the count is even, the median is the average of the two middle numbers in the sorted data, not one of the numbers in the set of numbers:  The median of 1, 2, 2, 3, 7, and 9 is 2.5.

Yet another way to examine repetitive data is to see which number occurs most often. This is the MODE. The most frequent number on its own may not be enough –we might want to find which number occurs second most often, third most, and so on.

For a symmetrical distribution of a group of numbers, these three measures of central tendency tend to be the same. For a skewed distribution of numbers, they are different.

A simple average (arithmetic mean) may be good enough for some data.  But for other data sets, a more specialised mean such as a geometric mean or the harmonic mean may be relevant. We will look at them, too, in this article.

AVERAGE: Returns the average (arithmetic mean) of its arguments

Syntax: AVERAGE(number1, [number2], …)

The AVERAGE function syntax has the following arguments:

  • Number1 (required). The first number, cell reference, or range for which you want the average.
  • Number2, etc. (optional). More numbers, cell references or ranges for which you want the average, up to a maximum of 255.

If you want to include logical values and text representations of numbers in a reference as part of the calculation, use the AVERAGEA function.

If you want to calculate the average of only the values that meet certain criteria, use the AVERAGEIF or the AVERAGEIFS functions.

Tip: When you average cells, keep in mind the difference between empty cells and those containing the value zero.  Empty cells are not included, but zero values are.  You may have difficulty seeing the difference if you have set cells with zeros to appear blank.  This is in File > Excel Options > Advanced > Display options for this worksheet, where you clear the checkbox “Show a zero in cells that have a zero value“.

Examples for AVERAGE and AVERAGEA

As demonstrated above, C5, the average of numbers only (SUM/COUNT) gives the same result as AVERAGE (C7).

Cell C6, the average of a range containing numbers and text (SUM/COUNTA) gives the same result as AVERAGEA (in C8).  Text –even numeric text– has a value of zero, but COUNTA and AVERAGEA. count it as an item.  We then divide the total by 6 instead of 5, giving a smaller mean.

But if you include the text in the arguments (as in C9), it is used –if the text evaluates to a number!

The average (arithmetic mean) is always larger than the geometric mean (below) which is larger than the harmonic mean (below that).

GEOMEAN The geometric mean

Syntax: GEOMEAN(number1, [number2], …)

The GEOMEAN arguments are identical to those for the AVERAGE function above.

Excel will use logical values and text representations of numbers that appear in the list of arguments.

If an array or reference argument contains text, logical values, or empty cells, those values are ignored.  However, cells with the value zero are included.

Arguments that are error values or text that cannot be translated into numbers cause errors.

If any data point ≤ 0, GEOMEAN returns the #NUM! error value.

The equation for the geometric mean is: GM = (a1 * a2 * a3 * a4 … an) ^ (1/n).

It is typically used for a set of numbers whose values are meant to be multiplied together or are exponential in nature.  Examples are growth figures, population values, and interest rates over time.

HARMEAN The harmonic mean

The harmonic mean is the reciprocal of the arithmetic mean of reciprocals.

Syntax: HARMEAN(number1, [number2], …)

The HARMEAN arguments are identical to those for the AVERAGE function above.

The harmonic mean is always less than the geometric mean, which is always less than the arithmetic mean.

Arguments can either be numbers or names, arrays, or references that contain numbers.

Excel uses logical values and text representations of numbers that appear in the list of arguments.

If an array or reference argument contains text, logical values, or empty cells, those values are ignored.  However, cells with the value zero are included.

Arguments that are error values or text that cannot be translated into numbers cause errors.

The harmonic mean has the lowest value of the three means.

It cannot be used on a data set containing negative numbers or zero.  If any data point ≤ 0, HARMEAN returns the #NUM! error value.

The equation for the harmonic mean is:

It is usually used to find the average of variables that are expressed as a ratio of different measuring units, such as miles/gallon or km/hour.

TRIMMEAN The mean of the inside part of a data set

TRIMMEAN calculates the mean (average) taken by excluding a percentage of data points from the top and bottom tails of a data set.  Use this function to exclude outlying data from your analysis.

Syntax: TRIMMEAN(array, percent)

Arguments:

  • Array (required): The array or range of values to trim and average.
  • Percent (required): The fractional number of data points to exclude from the calculation. For example, if percent = 0.2, 4 points are trimmed from a data set of 20 points (20 x 0.2): 2 from the top and 2 from the bottom of the set.

If percent < 0 or percent > 1, TRIMMEAN returns the #NUM! error value.

TRIMMEAN rounds the number of excluded data points down to the nearest multiple of 2. If percent = 0.1, 10 percent of 30 data points equals 3 points. For symmetry, TRIMMEAN excludes a single value from the top and bottom of the data set.

MEDIAN The median of the given numbers

The median is the number in the middle of a set of numbers.

Syntax: MEDIAN(number1, [number2], …)

Arguments:

  • Number1, number2, …    Number1 is required, subsequent numbers are optional. 1 to 255 numbers for which you want the median.

Remarks

If there is an even number of numbers in the set, then MEDIAN calculates the average of the two numbers in the middle. See the second formula in the example.

Arguments can either be numbers or names, arrays, or references that contain numbers.

Logical values and text representations of numbers that are in the list of arguments are counted.  If an array or reference argument contains text, logical values, or empty cells, those values are ignored.  However, cells with the value zero are included.

Arguments that are error values or text that cannot be translated into numbers cause errors.

MEDIAN Examples using Numbers and Text:

MODE The most common value in a data set

MODE returns the most frequently occurring, or repetitive, value in an array or range of data.

This function has been supplemented with two new functions.  They are MODE.MULT (for an array of MODE values) and MODE.SNGL (identical to plain MODE, hence redundant).

Syntax: MODE(number1, [number2],…)

Arguments:

  • Number1 (required): The first number argument for which you want to calculate the mode.
  • Number2,(optional): Number arguments 2 to 255 for which you want to calculate the mode. You can also use a single array or a reference to an array instead of arguments separated by commas.

Arguments can either be numbers or names, arrays, or references that contain numbers.

If an array or reference argument contains text, logical values, or empty cells, those values are ignored.  However, cells with the value zero are included.

Arguments that are error values or text that cannot be translated into numbers cause errors.

If the data set contains no duplicate data points, MODE returns the #N/A error value.

Mode is the most frequently occurring number in a group of numbers. For example, the mode of 2, 3, 3, 5, 7, and 10 is 3.

The newer functions are:

MODE.SNGL(number1, [number2],…) Returns the most common value in a data set (identical to MODE)

MODE.MULT(number1, [number2],…) Returns a vertical array of the most frequently occurring, or repetitive values in an array or range of data.

You enter this as an array function, thus:

  1. Start by selecting the number of cells corresponding to the number of most commonly-occurring duplicate values.  This might be a guess.  Select too many rather than too few.
  2. Type the MODE.MULT function with arguments.
  3. Press Shift+Ctrl+Enter to enter an array function into the selected cells.
  4. #N/A in the lower cell(s) indicates that you selected more cells than you have MODEs.  You could delete the entire array and redo the process with fewer cells.  Or live with the #N/A.

MODE Examples using Numbers:

Given the range A2:A12 containing the numbers 1, 2, 2, 3, 3, 3, 4, 5, 7, 7, and 8, in cell C2 the MODE =3, the one that repeats most frequently.

In cell C3, we have added 2 (A13) to the range.  2 and 3 both occur three times: If there is a tie, MODE takes the lowest one, in this case, 2.

In cells C4 and C5, MODE.SNGL gives the same result as for plain MODE in cells C2 and C3.

In cells C6:C8 we entered MODE.MULT(A2:A15) as an array function (with Shift+Ctrl+Enter).  This is useful for multiple MODEs, as there is a tie (both 2 and 3 occur three times).  Because there are only two ties, the third array cell contains #N/A.

MODE Examples using Text:

Although it would be useful if it did, MODE does not work with text:

In cell C15 we ask for MODE(A15:A23).  The range A15:A23 contains text only: “Dingbat”, “Dingbat”, “Gizmo”, “Gizmo”, “Gizmo”, “Widget”, “Widget”, “Widget”, and “Wotsit”.  Clearly, there are duplicates.  However, it does not work: MODE of an all-text range gives a #N/A error.

In C16: Adding a single number (1 in cell A24) doesn’t fix it.

In C17 we’ve added a second duplicate number.  This works, because MODE needs more than one duplicate number

In C19:C21, =MODE.MULT(A15:A23) gives the same result as for plain MODE

In C23:C25, =MODE.MULT(A24:A25) gives a more surprising result of 1, 1, 1.  We would have expected 1, #N/A, #N/A.

In cell C27, literal text in the formula =MODE(“A”, “B”, “A”, “C”) does not work either.  Despite the quotes, Excel seems to think the arguments are range names or cell addresses.

Similar Posts