median, mean, mode as minimizers

2024年5月9日

Median, mean, and mode are the most common measures of central tendency.

Norms

In mathematics, a norm measures the “distance” from a point to another. A norm is denoted by double vertical lines: ||x||.

Absolute-value norm is a norm on a single value, see (1).

(1)||x||=|x|

Euclidean norm is a norm on a list, see (2).

(2)||x||=x12++xn2

p-norm is a generalization of Euclidean norm. It is a norm on a list.

Let p1 be a real number. The p-norm of list x=(x1,,xn) is

||x||p:=(i=1n|xi|p)1/p.

When p=1, p-norm reduces to ||x||1:=i=1n|xi|. It takes absolute values, but it is not the absolute-value norm due to the symbol.

When p=2, we get Euclidean norm.

As p approaches , the p-norm approaches the infinity norm or maximum norm: ||x||:=maxi|xi| .

When p=0, 0-norm is not defined.

Statistical dispersions

In statistics, dispersion is the extent to which a distribution is stretched or squeezed.

For a given (finite) data set x, the dispersion about a point c is the “distance” from xi to the c in the p-norm:

fp(c)=xcp=(1ni=1n|xic|p)1/p

p dispersion central tendency
0 variation ratio mode
1 average absolute deviation median
2 standard deviation mean
maximum deviation mid-range

Average Absolute Deviation has the identical form of Mean Absolute Error (MAE) which is from the viewpoint of prediction.

The 2-norm is Mean Squared Error (MSE) in addition with the power of 12. MSE is 1ni=1n(xic)2.

The 0-norm counts the number of unequal points. Mode minimizes this 0-norm.

For p= the largest number dominates, and thus mid-range M=maxx+minx2 minimizes -norm.

See also

Minimum Cost to Make Array Equalindromic