Inequalities and Identities

import sympy
from sympy import symbols
sympy.init_printing(use_unicode=True)

General-purpose Inequalities

The Arithmetic-Geometric Mean Inequality

Let \(\{ a_j \}_{j=1}^n\) and \(\{q_j\}_{j=1}^n\) be sets of nonnegative numbers with \(\sum_j q_j = 1\). Then

\[\begin{equation*} \prod_{j=1}^n a_j^{q_j} < \sum_{j=1}^n q_j a_j. \end{equation*}\]

The Cauchy-Schwarz Inequality

Let \(\mathcal{X}\) be a real linear vector space, and let \(x, y \in \mathcal{X}\). Then

\[\begin{equation*} | \langle x, y \rangle|^2 \le \langle x, x \rangle \langle y, y \rangle. \end{equation*}\]

Rearrangement Inequalities

Rearrangements of sequences

Let \(x \equiv (x_j)_{j=1}^n \in {\mathbb R}^n\). The decreasing rearrangement of \(x\), denoted \(x^*\), has the same multiset of component values as \(x\), but ordered so that

\[\begin{equation*} x_1^* \ge x_2^* \ge \cdots \ge x_n^*. \end{equation*}\]

The increasing rearrangmement of \(x\), denoted \(x_*\), has the same multiset of component values as \(x\), but ordered so that

\[\begin{equation*} x_{*1} \le x_{*2} \le \cdots \le x_{*n}. \end{equation*}\]

The Hardy-Littlewood Rearrangement Inequality for two vectors

Suppose \(x \in \mathbb{R}^n\) and \(y \in \mathbb{R}^n\). Let \(x \cdot y \equiv \sum_{j=1}^n x_j y_j\). Then

\[\begin{equation*} x^* \cdot y_* \le x \cdot y \le x^* \cdot y^*. \end{equation*}\]

The Symmetric Decreasing Rearrangement of a function

Suppose \(f: \mathbb{R}^n \rightarrow \mathbb{R}^+\) has a finite integral. The symmetric decreasing rearrangement \(f^*\) of \(f\) is a function such that:

  • \(f(x) = f(-x)\) (symmetry)

  • \(f(x) \ge f(y)\) if \(y \ge x \ge 0\) (decreasing)

  • \(\int_{\mathbb{R}^n} 1_{f(x) \ge t} dx = \int_{\mathbb{R}^n} 1_{f^*(x) \ge t} dx\) for all \(t \ge 0\) (the measure of the level sets is the same)

The Hardy-Littlewood Rearrangement Inequality for two functions

Suppose \(f: \mathbb{R}^{n} \to \mathbb{R}^{+}\) and \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\). Then

\[\begin{equation*} \int_{\mathbb{R}^n} f(x) g(x) dx \le \int_{\mathbb{R}^n} f^*(x) g^*(x) dx. \end{equation*}\]

The Riesz Rearrangement Inequality for three functions

Suppose \(f: \mathbb{R}^{n} \to \mathbb{R}^{+}\), \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\), and \(g: \mathbb{R}^{n} \to \mathbb{R}^{+}\).

Then

\[\begin{equation*} \int f(x)g(x-y)h(y)\,dx\,dy \le \int f^{*}(x)g^{*}(x-y)h^{*}(y)\,dx\,dy. \end{equation*}\]

Identities involving conditional expectation

Law of Iterated Expectation (Adam’s Law, Law of Total Expectation)

If \(X\) and \(Y\) are random variables with a joint distribution,

\[\begin{equation*} \mathbb{E} X = \mathbb{E}_Y \mathbb{E}(X|Y). \end{equation*}\]

Law of Total Variance (Eve’s Law)

If \(X\) and \(Y\) are random variables with a joint distribution,

\[\begin{equation*} Var(X) = \mathbb{E}_Y Var(X|Y) + Var \mathbb{E} (X|Y). \end{equation*}\]

Probability Inequalities

This section follows Lugosi (2006) closely.

The tail-sum formula

If \(X\) is a nonnegative real-valued random variable,

\[\begin{equation*} {\mathbb E} X = \int_0^\infty {\mathbb{P}} \{X \ge x\} dx \end{equation*}\]

Jensen’s Inequality

If \(\phi\) is a convex function from \({\mathcal X}\) to \(\mathbb{R}\), then \(\phi({\mathbb E} X) \le {\mathbb E} \phi(X)\).

Chernoff bounds

If \(X\) has a moment generating function (if \({\mathbb{E}} e^{sX}\) exists for \(s \in [-a, a]\) for some \(a > 0\)), we can apply the generalized Markov Inequality with \(f(x) = e^{sx}\) for positive \(s\):

\[\begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \frac{{\mathbb E} e^{sX}}{e^{st}} \end{equation*}\]

for all \(s \in [-a, a]\). For any particular \(X\), one can optimize this over \(s\):

\[\begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \inf_{s \in [-a, a]} \frac{{\mathbb E} e^{sX}}{e^{st}}. \end{equation*}\]

Hoeffding’s Inequality

Let \(\{ X_j \}_{j=1}^n\) be independent, and define \(S_n \equiv \sum_{j=1}^n X_j\). Then, applying the Chernoff bound gives

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-st} {\mathbb E} e^{sS_n} = e^{-st} \prod_{j=1}^n e^{s(X_j - E X_j)}. \end{equation*}\]

Hoeffding bounds the moment generating function for a bounded random variable \(X\): If \(a \le X \le b\) with probability 1, then

\[\begin{equation*} {\mathbb E} e^{sX} \le e^{s^2 (b-a)^2/8}, \end{equation*}\]

from which follows Hoeffding’s tail bound:

If \(\{X_j\}_{j=1}^n\) are independent and \({\mathbb{P}} \{a_j \le X_j \le b_j\} = 1\), then

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*}\]

and

\[\begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \le -t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*}\]

Hoeffding’s Other Inequality

Suppose \(f\) is a convex, continuous, real function and \({\mathcal X}\) is a finite set. Let \(\{X_j \}_{j=1}^n\) be a simple random sample from \({\mathcal X}\) and let \(\{Y_j \}_{j=1}^n\) be an iid uniform random sample (with replacement) from \({\mathcal X}\). Then

\[\begin{equation*} {\mathbb E} f \left ( \sum_{j=1}^n X_j \right ) \le {\mathbb E} f \left ( \sum_{j=1}^n Y_j \right ). \end{equation*}\]

Bernstein’s Inequality

Suppose \(\{X_j \}_{j=1}^n\) are independent with \({\mathbb E} X_j = 0\) for all \(j\), \({\mathbb{P}} \{ | X_j | \le c\} = 1\), \(\sigma_j^2 = {\mathbb E} X_j^2\) and \(\sigma = \frac{1}{n} \sum_{j=1}^n \sigma_j^2\). Then for any \(\epsilon > 0\),

\[\begin{equation*} {\mathbb{P}} \{ S_n/n > \epsilon \} \le e^{-n \epsilon^2 / 2(\sigma^2 + c\epsilon/3)}. \end{equation*}\]

Martingales

A sequence of random variables \(\{ X_j \}\) is a martingale if

  • \(\mathbb{E} |X_j| < \infty\) \(\forall j\), and

  • \(\mathbb{E} (|X_{j+1}| \mid X_j) = X_j\).

A sequence of random variables \(\{ X_j \}\) is a martingale with respect to the sequence \(\{Y_j \}\) if

  • \(\mathbb{E} |X_j| < \infty\) \(\forall j\), and

  • \(\mathbb{E} (|X_{j+1}| \mid Y_j) = X_j\).

A nonnegative martingale is a martingale for which \(\Pr \{X_j \ge 0 \} = 1\) \(\forall j\).

Examples

  • A gambler’s fortune in a series of fair bets with the same stakes is a martingale: the expected fortune after the \(j+1\)st bet is the fortune after the \(j\)th bet.

  • Likelihood ratios: Suppose \(\{Y_j \}\) are IID with density \(f\), and let \(g\) be some other density. Define \(X_j \equiv \prod_{i=1}^j g(X_i)/f(X_i)\). Then \(\{X_j \}\) is a martingale with respect to \(\{Y_j\}\). (More generaly, likelihood ratios are nonnegative martingales with respect to the distribution in the denominator.)

Kolmogorov’s Inequality

If \(\{X_j\}\) is a nonnegative martingale, then

\[\begin{equation*} \Pr \{ \sup_j X_j \ge a \} \le \mathbb{E} X_1/a. \end{equation*}\]