My favourite inequalities

3127 words

Imbalances are ubiquitous in nature. Their very existence is a macrocosm of bizarre phenomena that react in wild and seemingly unpredictable ways. But mathematicians, the smartest and inquisitive of all humans, devised a way to reify these imbalances into a formal system that always hold true. Several abstract and difficult problems have been reduced to trivial solutions through a series of logical consequences that arise from inequalities. And much to the irony, we need these inequalities to prove equalities of all kinds.

Please bear in mind that over the course of this article, I will not go over the proofs of these inequalities to avoid making this a verbose mathematical discourse! I plan on keeping this extremely comprehensible and intuitive with a greater on focus on their practicality, but I certainly do not mind expanding on examples to help clear the picture for you.

The Mean Triplet

I consider the Mean-Triplet, more commonly known as the Arithmetic Mean(AM) - Geometric Mean(GM) - Harmonic Mean(HM) inequality, as the father of all inequalites. It is very fundamental and finds tremendous applications in everyday scenarios.

It states that for any non-negative real number, their AM is greater than their GM, which is in turn greater than their HM, with equality only when all values are equal.

Mathematically,

a1+a2++anna1×a2××annn1a1+1a2++1an\frac{a_1 + a_2 + \dots + a_n}{n} \geq \sqrt[n]{a_1 \times a_2 \times \dots \times a_n} \geq \frac{n}{\frac{1}{a_1} + \frac{1}{a_2} + \dots + \frac{1}{a_n}}

The AM-GM relationship is often used to maximize or minimize functions. The AM is simple and applies equal weight to every value when calculating the mean. The Geometric mean, on the other hand, is used to capture the averaging of compounding effects.

To illustrate this, let us consider the annual returns of an investment for 33 years to be 5%5\%, 8%8\%, and 6%6\% respectively. The formula for the average growth rate for the period is given by

(i=nn(1+ni))1n1 (\prod_{i=n}^{n}(1 + n_i))^\frac{1}{n} - 1

where nin_i represents the return for the ithith period and nn is the total number of periods. We need not really bother ourselves with the intricacies of this equation to understand the Geometric Mean but unfortunately, I love explaining.

The (1+ni)(1 + n_i) term is called the growth factor. We can arrive at this quite easily when we look at the formula for fractional (or percent) change.

finalinitialinitial=change \frac{final - initial}{initial} = change finalinitial=1+change \frac{final}{initial} = 1 + change

Evidently, the growth factor is simply the ratio of final and initial values, or 11 plus the fractional change. The significance of this lies in the fact that adding 11 accounts for the initial value as well as the growth rate. Here’s an example. Say an investment has grown by 12%12\% (growth rate) in the past year. This means that the fractional increase is 0.120.12 times the initial investment. The 11 in 1+0.121 + 0.12 includes the original investment. If we multiply 1.121.12 with the initial, we get the final return. We can also say that the final value is 112%112\% of the initial value. Similarly, if we incur a negative growth rate (loss) of say 5%5\%, the growth factor would be 1+(0.05)=0.951 + (-0.05) = 0.95, meaning the final value is 95%95\% of the original.

Now, multiplying all the growth factors gives us the total growth over all periods combined. And then taking its geometric mean averages the total growth over the entire period. This is important because unlike the AM, the GM successfully accounts for the compounding of each return in successive years. We then subtract 1 from this average to convert it to a percentage. But how did that work? If you noticed, the geometric mean of the total growth is the growth factor. In our previous examples, a growth factor of 1.12 gives us a 1.121=0.121.12 - 1 = 0.12 growth rate and a growth factor of 0.95 gives us a 0.951=0.050.95 - 1 = -0.05 growth rate respectively. This shows that subtracting 1 from the growth factor gives us the percentage increase (or decrease) in our investment. With all of that away, the final average growth rate in our case would be,

((1+0.05)(1+0.08)(1+0.06))131=1.06291=0.0629=6.29% ((1 + 0.05)(1 + 0.08)(1 + 0.06))^\frac{1}{3} - 1 = 1.0629 - 1 = 0.0629 = 6.29\%

Taking the Arithmetic Mean would simply result in

5%+8%+6%3=6.33% \frac{5\% + 8\% + 6\%}{3} = 6.33\%

which isn’t quite accurate. Note that the AM is always greater than the GM. The GM is more accurate in averaging rates with data whose relationships are multiplicative or proportional in nature, as in our case.

Additionally, we can also include the Quadratic Mean (QM, also known as the Root Mean Square), which applies a greater weight to the larger values and is often used in averaging varying quantities like AC current and voltage or measuring errors in statistics.

a12+a22++an2nAM\sqrt{\frac{a_1^2 + a_2^2 + \dots + a_n^2}{n}} \geq AM

Like the GM, the Harmonic mean is also useful when averaging rates. It is used in finance to calculate analysis metrics like Price-to-Earnings ratio, or in electrical engineering to calculate the equivalent resistance of resistors connected in parallel. Contrary to the Quadratic Mean, which assigns a greater weight to the larger values in the set, the HM punishes the central tendency by applying a greater weight to the lower values. This means that greater the extremes in a series of values, the lower the Harmonic Mean. This is because HM relies on the reciprocals of each value. The smaller the value, the larger is its reciprocal, and consequently, the greater its influence in shifting the mean towards itself. But it obviously cannot be calculated if any of the values is zero, since the reciprocal of zero does not exist.

The simplest example of the Harmonic Mean can be found while averaging speeds. Consider a car moving at 2020km/h for 1010km and 4040km/h for another 1010km. If we take the AM of the speeds, we’d get 3030km/h. But we know that average speed is equal to totaldistancetotaltime\frac{total distance}{total time}. If we now calculate the average speed, we get 26.6626.66 which is equal to the HM of the speeds and not the AM.

201020+1040=26.66=2120+140 \frac{20}{\frac{10}{20} + \frac{10}{40}} = 26.66 = \frac{2}{\frac{1}{20} + \frac{1}{40}}

Why does this work? It is because the harmonic mean of speeds is weighted by time and not by distance. Speed is inversely related to time. Hence, the time we spend at a particular speed must affect the overall average speed. The slower the speed, the greater is the time spent at that speed and the greater its impact in the average. The AM assumes that each speed is equally weighted, but what matters is not just the speed itself but the time spent at each speed. The HM is correct in applying greater weight to the the lower speeds.

The Cauchy-Schwarz Inequality

The Cauchy-Schwarz is perhaps the most brilliant inequality with far-reaching implications in all of mathematics.

It states that for any two sets of real numbers A={a1,a2,,an}A = \{a_1, a_2, \dots, a_n\} and B={b1,b2,,bn}B = \{b_1, b_2, \dots, b_n\}, the product of the sum of the squares of each number in both sets, is greater than or equal to the square of the sum of the product of corresponding values of each set.

(a12+a22++an2)(b12+b22++bn2)(a1b1+a2b2++anbn)2 (a_1^2 + a_2^2 + \dots + a_n^2)(b_1^2 + b_2^2 + \dots + b_n^2) \geq (a_1b_1 + a_2b_2 + \dots + a_nb_n)^2

with the equality holding iff

a1b1=a2b2==anbn \frac{a_1}{b_1} = \frac{a_2}{b_2} = \dots = \frac{a_n}{b_n}

A corollary definition of this inequality can be found in Vector Calculus, where it provides an upper bound for the for any two vectors v\vec{v} and uRn\vec{u} \in \mathbb{R}^n

vuvu \|\vec{v}\| \|\vec{u}\| \geq |\vec{v} \cdot \vec{u}|

where v\|\vec{v}\| and v\|\vec{v}\| are the norms of the vectors and uu| \vec{u} \cdot \vec{u} | is the absolute value of their dot product, with equality holding iff the vectors are linearly dependent or one of the vectors is a null vector. A set of vectors are linearly dependent when at least one of the vectors can be written as a linear combination of the other vectors (There exists a scalar aa which satisfies v=au\vec{v} = a\vec{u}).

The inequality provides an upper bound for the absolute value of the dot product of two vectors. Specifically, it states that the absolute value of the dot product cannot exceed the product of the magnitudes of the two vectors. Norm is just a fancy word used to describe the magnitude or length of the vector (in Euclidean space). By expanding the modulus of the inquality, we can determine the range of the dot product.

vuvuvu - \|\vec{v}\| \|\vec{u}\| \leq \vec{v} \cdot \vec{u} \leq \|\vec{v}\| \|\vec{u}\|

For the sake of simplicity, I am considering the vectors to be in real space Rn\mathbb{R}^n. But the inequality can be generalized, and holds true in non-Euclidean vector spaces with complex forms. The dot product is just a special case of the inner product in Rn\mathbb{R}^n space.

The applications of Cauchy-Schwarz can be seen when we take the cosine of the angle between two vectors

cosθ=vuvv \cos\theta = \frac{\vec{v} \cdot \vec{u}}{\|\vec{v}\| \|\vec{v}\|}

or in the covariance inequality in probability theory

Var(X)Var(Y)Cov(X,Y)2 Var(X)Var(Y) \geq Cov(X, Y)^2

where XX and YY are two random variables VarVar and CovCov are their variance and covariance. It also gives rise to an inequality known as Titu’s Lemma, which is of the form

a12b1+a22b2++an2bn(a1+a2++an)2b1+b2++bn \frac{a_1^2}{b_1} + \frac{a_2^2}{b_2} + \dots + \frac{a_n^2}{b_n} \geq \frac{(a_1 + a_2 + \dots + a_n)^2}{b_1 + b_2 + \dots + b_n}

This is often used as a shortcut for solving inequalities whose numerators are perfect squares.

The Rearrangement Inequality

If you’re familiar with greedy algorithms in computer science, this is exactly what it is.

It states that for any two sets of monotonically increasing real numbers A={a1a2an}A = \{ a_1 \leq a_2 \leq \dots \leq a_n \} and B={b1b2bn}B = \{ b_1 \leq b_2 \leq \dots \leq b_n \}, the value of a1b1+a2b2++anbna_1'b_1' + a_2'b_2' + \dots + a_n'b_n', where ana_n' and bnb_n' denotes the nth value in any random permutation of both the sets, is maximised and minimised when

anbn+an1bn1+a1b1a1b1+a2b2++anbnanb1+an1b2++a1bn a_nb_n + a_{n-1}b_{n-1} + a_1b_1 \geq a_1'b_1' + a_2'b_2' + \dots + a_n'b_n' \geq a_nb_1 + a_{n-1}b_2 + \dots + a_1b_n

with equality when the values of one or both sets are equal.

If the above notation looks a little cumbersome, here’s a breakdown of the inequality through an example. Consider three bags of 1010 rupee, 2020 rupee and 5050 rupee notes, each containing 33 notes each. If we are asked to get the maximum amount by picking 55 notes in total, we would obviously pick 33 notes from the 5050 rupee bag, 22 from the 2020, and none from the 1010 rupee bag. This conclusion is simply common sense. Similarly, if we are asked to get the minimum, we’d pick three 1010 rupee and two 2020 rupee notes. The rearrangement inequality illustrates this exact “greedy” logic. Any other permutation (or grouping) of the notes, with the given constraint of 5 notes, would result in a value that lies between the two.

This consequently leads us to another famous inequality known as Chebyshev’s inequality.

n(a1b1+a2b2++anbn)(a1+a2++an)(b1+b2++bn) n(a_1b_1 + a_2b_2 + \dots + a_nb_n) \geq (a_1 + a_2 + \dots + a_n)(b_1 + b_2 + \dots + b_n)

but when the sequences are oppositely oriented, say B={b1b2bn}B = \{b_1 \geq b_2 \geq \dots \geq b_n \}

n(a1b1+a2b2++anbn)(a1+a2++an)(b1+b2++bn) n(a_1b_1 + a_2b_2 + \dots + a_nb_n) \leq (a_1 + a_2 + \dots + a_n)(b_1 + b_2 + \dots + b_n)

In both the cases, equality is achieved when the sequences are constant. The impact of Chebyshev’s Inequality is widely felt in probability theory. It provides an upper bound on the probability that a random variable (say XX) deviates from its mean by more than a definite amount of standard deviations.

P(Xμnσ)1n2 P(|X - \mu| \geq n\sigma) \leq \frac{1}{n^2}

where μ\mu is the mean, σ\sigma is the SD and nn is the number of SD. This bound is not definitive but only a conservative estimate which works well for most cases, regardless of the type of distribution. Due to this, it is popular in many statistical inferences. Let’s take an example.

The mean (μ\mu) of the subject score for a group of students is 6060 and the standard deviation is 1515. If we randomly pick a student, what is the probability that the student has a score which is greater than 3030 but less than 9090?

There are other ways to arrive at the right answer, but we’ll stick to using Chebyshev. Let’s first calculate the nn (also the z-score), which gives us the number of standard deviations above or below the mean. In this case, 603015=906015=2\frac{60 - 30}{15} = \frac{90 - 60}{15} = 2. If the values of n were not equal, we would settle with the greater z-score (we are only concerned about the magnitude and not sign). We now need P(Xμ<2)P(|X - \mu| < 2), which is equal to 1P(Xμ2)1 - P(|X - \mu| \geq 2). Therefore

P(Xμ<2)11n2 P(|X - \mu| < 2) \geq 1 - \frac{1}{n^2} P(Xμ<2)114=10.25=0.75 P(|X - \mu| < 2) \geq 1 - \frac{1}{4} = 1 - 0.25 = 0.75

This shows that there is at least a 75%75\% chance that a randomly chosen student has a score between 3030 and 9090.

Jensen’s Inequality

This is an extremely powerful inequality (and probably my favourite of all). To understand Jensen’s inequality, let us first explore convex and concave functions.

A function is convex (also concave up) if the second derivative, f(x)0f''(x) \geq 0 and concave (also concave down) if f(x)0f''(x) \leq 0 for all xRx \in \mathbb{R}. If f(x)=0f''(x) = 0, the point is an inflection point where the function’s concavity (or convexity) changes. Further, the sign of the first derivative tells us whether the function is increasing or decreasing in the said interval.

Concavity

Now, the inequality states that if f(x)f(x) is convex in the interval [a,b][a, b], then

f(a1+a2++ann)f(a1)+f(a2)++f(an)n f(\frac{a_1 + a_2 + \dots + a_n}{n}) \leq \frac{f(a_1) + f(a_2) + \dots + f(a_n)}{n}

and if it is concave, the reverse inequality holds true x[a,b]\forall x \in [a, b]

The simplest application of this inequality can be demonstrated by proving the AM-GM inequality. Let us consider a function f(x)=log(x)f(x) = -log(x), whose graph looks like this. It is simply the reflection of log(x)log(x) against the x-axis. Log

The first derivative of this function is f(x)=1xf'(x) = -\frac{1}{x}, and its second derivative f(x)=1x2f''(x) = \frac{1}{x^2}. Since f(x)>0f''(x) > 0 for all xR{0}x \in \mathbb{R} - \{0\}, f(x)f(x) is concave up for x>0x > 0. Thus, by using the inequality,

log(x1+x2++xnn)(log(x1)+log(x2)++log(xn))n -log(\frac{x_1 + x_2 + \dots + x_n}{n}) \leq \frac{-(log(x_1) + log(x_2) + \dots + log(x_n))}{n} log(x1+x2++xnn)log(x1)+log(x2)++log(xn)n log(\frac{x_1 + x_2 + \dots + x_n}{n}) \geq \frac{log(x_1) + log(x_2) + \dots + log(x_n)}{n}

When we multiply both sides of an inequality by 1-1, the inequality reverses. If you remember your properties of logarithm, you’ll likely remember these two rules.

logan(b)m=mnloga(b) log_{a^n}(b)^m = \frac{m}{n}log_a(b) loga(b)=logxblogxa log_a(b) = \frac{log_x{b}}{log_x{a}}

Our goal is to raise the numerator to a fraction 1n\frac{1}{n}, since we need to obtain the Geometric Mean, which is a surd. If we do this, we have to compensate by raising the power of the base to the same fraction, by following rule 1. Similarly, for the denominator, we know loge1/n(e)=n×loge(e)=nlog_{e^1/n}(e) = n \times log_e(e) = n as log(e)=1log(e) = 1.

LHSloge1/n(x1x2xn)1nloge1/n(e) LHS \geq \frac{log_{e^1/n}(x_1 x_2 \dots x_n)^\frac{1}{n}}{log_{e^1/n}(e)}

Now, applying the second rule to this inequality, we can replace the fraction with a logarithm whose base is the argument of the denominator (ee in this case) and its argument the numerator. Hence, we get

loge(x1+x2++xnn)loge((x1x2xn)1n) log_e(\frac{x_1 + x_2 + \dots + x_n}{n}) \geq log_e({(x_1 x_2 \dots x_n)^\frac{1}{n}})

Since the base of the logarithms on both sides are ee, which is greater than 1, we can remove the function and simply apply the inequality to its arguments.

x1+x2++xnn(x1x2xn)1n \frac{x_1 + x_2 + \dots + x_n}{n} \geq (x_1 x_2 \dots x_n)^\frac{1}{n}

if loga(x)>loga(y)log_a(x) > log_a(y), then x>yx > y when a>1a > 1 and x<yx < y when 0<a<10 < a < 1

This proves the AM-GM inequality. Isn’t it remarkable?

Bonus: The Chicken-McNugget Theorem

This has to be my favourite little theorem of all time, not just for its fancy name, but for good reason. It states that for any two co-prime numbers aa and bb, the greatest integer that cannot be written as a linear combination of a and b (ax+by)(ax + by) is mnmnmn - m - n. Let’s illustrate this with an example.

Consider a game that awards 77 points and 1111 points to a player on the basis of various scenarios. Since, 77 and 11 are co-prime, we can apply the McNugget theorem and find that 5959, which is (7×11711)(7 \times 11 - 7 - 11) , is the maximum amount of points that cannot be scored by any player. Thus, scoring 5959 points is an impossibility for any player playing the game. After 5959 points, every point is possible as they can be represented as a linear combination of 77 and 1111.

60=7×7+11×1 60 = 7 \times 7 + 11 \times 1 61=7×4+11×3 61 = 7 \times 4 + 11 \times 3 62=7×1+11×5 62 = 7 \times 1 + 11 \times 5

Of course, smaller impossible scores like 10, 20, etc. exist, but are of trivial concern.

You can read the story behind it’s unique name here :)

Conclusion

Mathematics was my top subject throughout school. I probably owe it to my dad, who has a masters in Pure Mathematics, for impressing me early in my childhood. Although I barely participated in rigorous maths competitions and cannot boast of any prodigious success, my fascination for the discipline has no upper-bounds. You can try the inequalities section in Art of Problem Solving for an in-depth reference or tackle the olympiad problems if you feel up for a challenge.

Always remember to embrace, not suppress the imbalances that wrestle inside you. They all exist for a reason.

Finished in my dorm room (6008), surrounded by a growing scent of rapidly burning candlesticks.