Confidence bounds via the Chebychev and Hoeffding Inequalities¶

Chebychev’s inequality¶

Theorem

Suppose $X$ has finite expected value $\mu$ and variance $\sigma^2$. Then

\[\begin{equation*} \mathbb P_\mu (| X - \mu | \ge k \sigma ) \le 1/k^2. \end{equation*}\]

Using Chebychev’s inequality for confidence intervals¶

Suppose that $\{X_j\}_{j=1}^n$ are independent random variables and $\{\ell_j\}_{j=1}^n$ and $\{u_j\}_{j=1}^n$ are real numbers such that

\[\begin{equation*} \mathbb P (X_j \in [\ell_j, u_j]) = 1, \forall j = 1, \ldots, n.\end{equation*}\]

Let $\bar{X} \equiv \frac{1}{n} \sum_{j=1}^n X_j$, $\mu = \mathbb E \bar{X}$, and $\tau^2 \equiv \sum_{j=1}^n (u_j - \ell_j)^2$.

Let’s bound $\mbox{Var } X_j = \mathbb E (X_j - \mathbb E X_j)^2$. Intuitively, the variance is maximized by “spreading out” the distribution as much as possible, which would mean putting mass 1/2 at $\ell_j$ and mass 1/2 at $u_j$. (This is directly analogous to the fact that the variance of a Bernoulli distribution is maximal when $p = 1/2$.) The mean of the corresponding distribution is $(\ell_j + u_j)/2$, and the variance is

\[\begin{equation*} \frac{1}{2} \left ( \ell_j - (\ell_j + u_j)/2 \right )^2 + \frac{1}{2} \left ( u_j - (\ell_j + u_j)/2 \right )^2 = \frac{1}{2} \left ( \frac{\ell_j - u_j}{2} \right )^2 + \frac{1}{2} \left ( \frac{u_j - \ell_j}{2} \right )^2 \end{equation*}\]

\[\begin{equation*} = \left ( \frac{\ell_j - u_j}{2} \right )^2 = \frac{(u_j - \ell_j)^2}{4} . \end{equation*}\]

(As a sanity check, this variable is a shifted and scaled Bernoulli with $p=1/2$, which has variance $1/4$. The shift doesn’t affect the variance, and the scale factor is $u_j - \ell_j$, so the new variance should be $1/4 \times (u_j - \ell_j)^2$.)

Hence

\[\begin{equation*} \mbox{Var } \bar{X} \equiv \sigma^2 = \frac{1}{n^2} \sum_{j=1}^n \mbox{Var }X_j \le \frac{\tau^2}{4n^2}.\end{equation*}\]

Since $\sigma \le \tau/(2n)$,

\[\begin{equation*} \mathbb P_\mu \left (| \bar{X} - \mu | \ge k \frac{\tau}{2 n} \right ) \le \mathbb P_\mu \left (| \bar{X} - \mu | \ge k \sigma \right ) \le 1/k^2, \end{equation*}\]

and thus

\[\begin{equation*} \mathbb P_\mu \left (| \bar{X} - \mu | \ge a \right ) \le \frac{\tau^2}{4n^2 a^2}. \end{equation*}\]

This bound is trivial (because it is greater than 1) until

\[\begin{equation*} \frac{\tau^2}{4n^2 a^2} < 1, \end{equation*}\]

i.e.,

\[\begin{equation*} a > \frac{\tau}{2 n}. \end{equation*}\]

Hoeffding’s Inequality¶

Let $\{X_j\}_{j=1}^n$ be independent random variables and $\{\ell_j\}_{j=1}^n$ and $\{u_j\}_{j=1}^n$ be real numbers such that

\[\begin{equation*} \mathbb P (X_j \in [\ell_j, u_j]) = 1, \forall j = 1, \ldots, n,\end{equation*}\]

and define $\bar{X} \equiv \frac{1}{n} \sum_j X_j$, $\mu \equiv \mathbb E \bar{X}$, and $\tau^2 \equiv \sum_{j=1}^n (u_j - \ell_j)^2$.

Then \begin{equation*} \mathbb P_\mu ( \bar{X} - \mu \ge a ) \le \exp \left (-\frac{2n^2 a^2}{\tau^2} \right ) \end{equation*} and \begin{equation*} \mathbb P_\mu ( |\bar{X} - \mu | \ge a ) \le 2 \exp \left(-\frac{2n^2 a^2}{\tau^2} \right ). \end{equation*}

This bound is useful once

\[\begin{equation*} 2 \exp \left(-\frac{2n^2 a^2}{\tau^2} \right ) < 1, \end{equation*}\]

i.e.,

\[\begin{equation*} a > \sqrt{-2\ln 1/2} \cdot \frac{\tau}{2 n}. \end{equation*}\]

Since $\sqrt{-2 \ln 1/2} \approx 1.18$, Hoeffding’s inequality requires $a$ to be about 18% larger to be useful compared to Chebychev’s inequality, but it is exponentially better as $n$ grows.

Truncation¶

If we know a priori that $\mu \in [\ell, u]$, we can take the intersection of any confidence interval $\mathcal I$ with $[\ell, u]$ without losing any coverage probability: if $\mathcal I \ni \mu$, then $\mathcal I \cap [\ell, u] \ni \mu$. Hence, we can truncate any confidence interval by taking its intersection with $[\ell, u]$ without reducing the confidence level.

For instance, suppose $\mathbb P X_j \in [\ell_j, u_j]) = 1, \forall j = 1, \ldots, n$. Define $\ell \equiv \frac{1}{n} \sum_{j=1}^n \ell_j$ and $u \equiv \frac{1}{n} \sum_{j=1}^n u_j$. Then

\[\begin{equation*} \mathbb P \{ \bar{X} \in [\ell, u] \} = 1,\end{equation*}\]

and thus $\mu \equiv \mathbb E \bar{X} \in [\ell, u]$. Since we know with certainty that $\mu \in [\ell, u]$, any portion of a confidence interval $\mathcal I$ that lies outside $[\ell, u]$ cannot help it cover $\mu$, and we might as well use $\mathcal I \cap [\ell, u]$ if it is shorter than $\mathcal I$: we gain precision without losing coverage.

Basing a confidence interval on Chebychev’s inequality or Hoeffding’s inequality sometimes produces an interval that can be improved by truncation, decreasing the length (and expected length) without sacrificing coverage.

Comparing Hoeffding’s Inequality and Chebychev’s Inequality for the sample mean from finite bounded populations¶

Both inequalities bound $\mathbb P_\mu (|\bar{X} - \mu | \ge a)$ in terms of $\frac{\sum_{j=1}^n (u_j - \ell_j)^2}{n^2 a^2}$, but Hoeffding’s bound is exponentially better.

Here is a comparison, normalized by taking $\tau^2 \equiv \sum_{j=1}^n (u_j - \ell_j)^2 = n$ so that $\tau^2/n = 1$, as would be the case if $\ell_j = 0$ and $u_j = 1$, $\forall j$:

# This is the first cell with code: set up the Python environment
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import pandas as pd
from ipywidgets.widgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML

def chebychev(n, ssByn, a):
    return ssByn/(4.0 * n * np.square(a))

def hoeffding(n, ssByn, a):
    return 2*np.exp(-2.0 * n * np.square(a) / ssByn)

def chebychevHoeffdingBoundPlot(n, ssByn = 1.0, aMax = 1.0, pts = 1000):
    ''' 
       Plots the bounds on P( |\bar{X} - \mu | \ge a) for Chebychev's inequality and 
       Hoeffding's inequality.
       \bar{X} is the sample mean of n independent bounded random variables.
       ssBy is the mean sum of squares of the differences between the upper and lower bounds on 
       each of those n variables.
       Plots the curves as pts points
    '''
    fig, ax = plt.subplots(nrows=1, ncols=1)
    aMin = math.sqrt(-math.log(0.5)*ssByn/(2.0*n))  # smallest a for which Hoeffding bound is nontrivial
    x = np.linspace(aMin, aMax, pts+1)
    cheb, = plt.plot(x, np.fmin(1,chebychev(n, ssByn, x)), color='b', label='Chebychev')
    hoef, = plt.plot(x, hoeffding(n, ssByn, x), color='r', label='Hoeffding')
    plt.title(r'Bound on $\mathbf{P} (|\bar{X} - \mu| > a)$')
    plt.xlabel('a')
    plt.ylabel('Probability')
    plt.legend(loc = 'best')

interact(chebychevHoeffdingBoundPlot, n=widgets.IntSlider(min=5, max=300, step=1, value=30),\
         ssByn = widgets.FloatSlider(min=0.5, max=10, step=0.1, value=1.0),\
         aMax = widgets.fixed(1), pts = widgets.fixed(1000)
        )

<function __main__.chebychevHoeffdingBoundPlot(n, ssByn=1.0, aMax=1.0, pts=1000)>

Using Hoeffding’s inequality for confidence intervals¶

Suppose that $\{X_j \}_{j=1}^n$ are iid with support $[0, 1]$. Then $\tau = \sqrt{n}$.

It follows that \begin{equation*} A_\alpha(\theta) = \left [0, \theta + \sqrt{\frac{-\ln \alpha}{2n}} \right ] \end{equation*} is the (one-sided) acceptance region for a level-$\alpha$ test of the hypothesis $\mu = \theta$; \begin{equation*} \mathbb P_\mu \left (\bar{X} - \sqrt{\frac{-\ln \alpha}{2n}} \ge \mu \right ) \le \alpha; \end{equation*} and \begin{equation*} \left [\bar{X} - \sqrt{\frac{-\ln \alpha}{2n}}, 1 \right ]\end{equation*} is a one-sided confidence interval for $\mu$, with confidence level $1-\alpha$.

Similarly, \begin{equation*} A_\alpha(\theta) = \left [\theta - \sqrt{\frac{-\ln \alpha/2}{2n}}, \theta + \sqrt{\frac{-\ln \alpha/2}{2n}} \right ] \end{equation*} is the two-sided acceptance region for a level-$\alpha$ test of the hypothesis $\mu = \theta$; \begin{equation*} \mathbb P_\mu \left ( \left | \bar{X} - \sqrt{\frac{-\ln \alpha/2}{2n}} \right | \ge \mu \right ) \le \alpha; \end{equation*} and \begin{equation*} \left [\bar{X} - \sqrt{\frac{-\ln \alpha/2}{2n}}, \bar{X} + \sqrt{\frac{-\ln \alpha/2}{2n}} \right ]\end{equation*} is a two-sided confidence interval for $\mu$, with confidence level $1-\alpha$.

Let's compare Hoeffding confidence intervals and intervals based on the normal approximation by simulation, in the cases we saw previously in [Confidence intervals based on the normal approximation](normApprox.ipynb). We will use two-sided bounds, even though in applications one-sided bounds are often more interesting.

Of course, if we know a priori that $\mu \in [\ell, u]$, the intersection of the Hoeffding confidence interval with $[\ell, u]$ is still a $1-\alpha$ confidence interval.

# Population of two values, {0, 1}, in various proportions.  Amounts to Binomial random variable
ns = np.array([25, 50, 100, 400])  # sample sizes
ps = np.array([.001, .01, 0.1])    # mixture fractions, proportion of 1s in the population
alpha = 0.05  # 1- (confidence level)
reps = int(1.0e4)  # just for demonstration
vals = [0, 1]

simTable = pd.DataFrame(columns=('fraction of 1s', 'sample size', 'Student-t cov',\
                                 'Hoeffding cov', 'Student-t len', 'Trunc Hoeffding len'))
for p in ps:
    popMean = p
    for n in ns:
        tCrit = sp.stats.t.ppf(q=1-alpha/2, df=n-1)
        hCrit = np.sqrt(-math.log(alpha/2)/(2*n))
        sam = sp.stats.binom.rvs(n, p, size=reps)
        samMean = sam/float(n)
        samSD = np.sqrt(samMean*(1-samMean)/(n-1))
        coverT = (np.fabs(samMean-popMean) < tCrit*samSD).sum()
        lenT = 2*(tCrit*samSD).mean()
        lenH = (np.minimum(samMean+hCrit, 1.0)-np.maximum(samMean-hCrit, 0.0)).mean()
        coverH = (np.fabs(samMean-popMean) < hCrit).sum()
        simTable.loc[len(simTable)] =  p, n, str(100*float(coverT)/float(reps)) + '%', \
            str(100*float(coverH)/float(reps)) + '%',\
            str(round(lenT,4)),\
            str(round(lenH,4))
#
ansStr =  '<h3>Simulated coverage probability for Student-t and truncated Hoeffding confidence intervals for a {0, 1} population</h3>' +\
          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
          '%</strong>.<br /><strong>Estimated from ' + str(round(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)

Simulated coverage probability for Student-t and truncated Hoeffding confidence intervals for a {0, 1} population

Nominal coverage probability 95.0%.
Estimated from 10000 replications.

	fraction of 1s	sample size	Student-t cov	Hoeffding cov	Student-t len	Trunc Hoeffding len
0	0.001	25	2.23%	100.0%	0.0037	0.2725
1	0.001	50	4.88%	100.0%	0.004	0.1931
2	0.001	100	10.0%	100.0%	0.0041	0.1369
3	0.001	400	33.22%	100.0%	0.0035	0.0689
4	0.010	25	22.26%	100.0%	0.0384	0.2816
5	0.010	50	38.6%	100.0%	0.0343	0.2019
6	0.010	100	63.6%	100.0%	0.0308	0.1459
7	0.010	400	90.66%	100.0%	0.0187	0.0778
8	0.100	25	92.35%	99.98%	0.2314	0.3709
9	0.100	50	87.91%	100.0%	0.1669	0.2918
10	0.100	100	93.67%	99.99%	0.1181	0.234
11	0.100	400	94.91%	100.0%	0.0589	0.1358

# Nonstandard mixture: a pointmass at zero and a uniform[0,1]
ns = np.array([25, 50, 100, 400])  # sample sizes
ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass
alpha = 0.05  # 1- (confidence level)
reps = int(1.0e4)   # just for demonstration

simTable = pd.DataFrame(columns=('mass at 0', 'sample size', 'Student-t cov',\
                                 'Hoeffding cov', 'Student-t len', 'Trunc Hoeffding len')
                       )

for p in ps:
    popMean = (1-p)*0.5  #  p*0 + (1-p)*.5
    for n in ns:
        tCrit = sp.stats.t.ppf(q=1-alpha/2, df=n-1)
        hCrit = np.sqrt(-math.log(alpha/2)/(2*n))
        coverT = 0
        coverH = 0
        totT = 0.0
        totH = 0.0
        for rep in range(int(reps)):
            sam = np.random.uniform(size=n)
            ptMass = np.random.uniform(size=n)
            sam[ptMass < p] = 0.0
            samMean = np.mean(sam)
            samSD = np.std(sam, ddof=1)
            coverT += ( math.fabs(samMean-popMean) < tCrit*samSD )
            totT += 2*tCrit*samSD
            coverH += ( math.fabs(samMean-popMean) < hCrit )
            totH +=  min(samMean+hCrit, 1.0)-max(samMean-hCrit, 0.0)
        simTable.loc[len(simTable)] =  p, n,\
            str(100*float(coverT)/float(reps)) + '%',\
            str(100*float(coverH)/float(reps)) + '%',\
            str(round(totT/float(reps),4)),\
            str(round(totH/float(reps),4))
#
ansStr =  '<h3>Simulated coverage probability of Student-t and truncated Hoeffding confidence intervals for ' +\
          'mixture of U[0,1] and pointmass at 0 population</h3>' +\
          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
          '%</strong>.<br /><strong>Estimated from ' + str(round(reps,0)) + ' replications.</strong>'

display(HTML(ansStr))
display(simTable)

Simulated coverage probability of Student-t and truncated Hoeffding confidence intervals for mixture of U[0,1] and pointmass at 0 population

Nominal coverage probability 95.0%.
Estimated from 10000.0 replications.

	mass at 0	sample size	Student-t cov	Hoeffding cov	Student-t len	Trunc Hoeffding len
0	0.900	25.0	90.56%	100.0%	0.6459	0.3219
1	0.900	50.0	98.75%	100.0%	0.6707	0.2421
2	0.900	100.0	99.97%	100.0%	0.6816	0.1858
3	0.900	400.0	100.0%	100.0%	0.6868	0.1178
4	0.990	25.0	22.2%	100.0%	0.0988	0.2767
5	0.990	50.0	38.95%	100.0%	0.1268	0.1971
6	0.990	100.0	62.96%	100.0%	0.1616	0.1408
7	0.990	400.0	97.81%	100.0%	0.2101	0.0729
8	0.999	25.0	2.66%	100.0%	0.0112	0.2722
9	0.999	50.0	4.87%	100.0%	0.0139	0.1926
10	0.999	100.0	9.54%	100.0%	0.019	0.1363
11	0.999	400.0	32.32%	100.0%	0.0352	0.0684

Remarkably, the truncated Hoeffding intervals are often shorter on average than the Student-t intervals, but have far better coverage.

More in this direction…¶

We could also apply other inequalities, such as Serfling’s Inequality (a version of Hoeffding’s inequality for sampling without replacement) or Feige’s Inequality.

For sampling without replacement, the “Empirical Bernstein-Serfling Inequality” of Bardenet and Maillard (2013) seems particularly worth exploring.

What’s next?¶

Now we will consider some conservative methods for constructing lower confidence intervals for the mean of nonnegative populations

Collaborative and Reproducible Data Science

Confidence bounds via the Chebychev and Hoeffding Inequalities

Contents