Supermartingale-based Tests

Martingales, supermartingales, submartingales

Let \((X_j)_{j \in \mathbb{N}} = X_1, X_2, \ldots\) and \((Y_j)_{j \in \mathbb{N}} = Y_1, Y_2, \ldots\) be sequences of random variables (stochastic processes). Let \(Y^{i} := (Y_1, \ldots, Y_i)\) be the first \(i\) elements of the sequence \((Y_j)_{j \in \mathbb{N}}\). Suppose \(\mathbb{E}|X_j| < \infty\) for all \(j \in \mathbb{N}\).

Then \((X_j)_{j \in \mathbb{N}}\) is a martingale with respect to \((Y_j)_{j \in \mathbb{N}}\) if

()\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) = X_{j-1}. \end{equation}\]

It is a supermartingale if

()\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) \le X_{j-1}, \end{equation}\]

and a submartingale if

()\[\begin{equation} \mathbb{E}(X_j | Y^{j-1}) \ge X_{j-1}. \end{equation}\]

A martingale, supermartingale, or submartingale is nonnegative if \(\mathbb{P} \{X_j \ge 0 \} = 1\) for all \(j \in \mathbb{N}\).

Terminology: A sequence \((\lambda_i)_{i \in \mathbb{N}}\) is predictable with respect to \((Y_j)_{j \in \mathbb{N}}\) if \(\lambda_j\) does not depend on \((Y_i)_{i>j}\).

Examples

Product of independent random variables with expected value 1.

Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 1. Define \(X_j := \prod_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then

()\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = \mathbb{E} (X_{j-1}Y_j | Y^{j-1}) = X_{j-1} \mathbb{E} (Y_j | Y^{j-1}) = X_{j-1}. \end{equation}\]

Note that if \((Z_i)\) are independent random variables, then \(\prod_{i=1}^j Z_i/\mathbb{E}Z_i\) is a martingale.

Sum of independent random variables with expected value 0.

Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 0. Define \(X_j := \sum_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then

()\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = \mathbb{E} (X_{j-1}+Y_j | Y^{j-1}) = X_{j-1} + \mathbb{E} (Y_j | Y^{j-1}) = X_{j-1}. \end{equation}\]

Gambler’s fortune in repeated bets on independent fair coin tosses.

Suppose we bet on the outcomes of a sequence of independent fair coin tosses \(Y_1, Y_2, \ldots\), i.e., \(\{Y_j\}_{j \in \mathbb{N}}\) are IID \(\mbox{Bernoulli}(1/2)\). Our initial fortune \(X_0\); our fortune after the \(j\)th wager is \(X_j\). We are not allowed to wager more than our current fortune on the next bet. If \(X_i\) is zero then \(X_k = 0\) for all \(k > i\): we can’t bet if we go broke. The wager on the \(j\) toss is \(b_j \in [0, X_{j-1}]\); \(b_j\) may depend on \(Y^{j-1}\). If \(Y_j=1\), we win the \(j\)th bet and our fortune increases by \(b_j\); if \(Y_j=0\), we lose the \(j\)th bet and our fortune decreases by \(b_j\). Then \(X_j = X_{j-1}+b_jY_j - b_j(1-Y_j)\).

()\[\begin{equation} \mathbb{E} (X_j | Y^{j-1}) = X_{j-1}+ b_j \mathbb{E} (2Y_j-1) = X_{j-1}. \end{equation}\]

Thus the fortune \((X_j)_{j \in \mathbb{N}}\) is a nonnegative martingale with respect to the Bernoulli coin tosses \((Y_j)_{j \in \mathbb{N}}\).

Likelihood ratios

We have a family \(\mathcal{P} := \{ \mathbb{P}_\theta : \theta \in \Theta \}\) of probability distributions on a measurable space \(\mathcal{X}\), dominated by a common \(\sigma\)-finite measure \(\mu\). Let \(p_\theta(x)\), \(x \in \mathcal{X}\) denote the density of \(P_\theta\) with respect to \(\mu\). For fixed \(x \in \mathcal{X}\), the function

()\[\begin{eqnarray} \mathcal{L} &=& \mathcal{L}_x : \mathcal{X} \mapsto \Re^+ \\ \theta &\rightarrow& p_\theta(x) \end{eqnarray}\]

is the likelihood. If we observe \(X=x\), then for \(\eta \in \Theta\), \(\mathcal{L}(\eta) = \mathcal{L}(\eta | X=x)\) is the _likelihood of \(\eta\) given \(X=x\). The likelihood ratio of \(\eta\) to \(\nu\) given \(X=x\) is \(\mathcal{L}(\eta | X=x)/\mathcal{L}(\nu | X=x)\).

Suppose \(\mathbb{P}_1\) and \(\mathbb{P}_2\) are distributions on some outcome space and that they have densities with respect to some dominating measure \(\mu\). Let \(f\) and \(g\) denote those densities. Suppose \((Y_j)_{j \in \mathbb{N}}\) are IID with density \(f\). Define the likelihood ratio process

()\[\begin{equation} L_j := \prod_{i=1}^j g(Y_i)/f(Y_i), \;\; j \in \mathbb{N}. \end{equation}\]

Now

()\[\begin{equation} \mathbb{E} (L_j | Y^{j-1}) = L_{j-1} \mathbb{E}(g(Y_j)/f(Y_j)) = L_{j-1} \int_\Omega \frac{g(\omega)}{f(\omega)} f(\omega) d\mu(\omega) = L_{j-1} \int_\Omega g(\omega) d\mu(\omega) = L_{j-1} \end{equation}\]

since \(g\) is a probability density with respect to \(\mu\).

Prior-posterior ratio martingale

See Waudby-Smith and Ramdas (2021).

Again, we have a family of distributions Let \(\Theta\) be countable and suppose \(\{\mathbb{P}_\theta : \theta \in \Theta\}\) are dominated by a common distribution \(\mu\), where \(p_\theta\) is the density of \(\mathbb{P}_\theta\) with respect to \(\mu\). Suppose \(\Theta\) is measurable and let \(\pi\) (the prior) be a probability distribution on \(\Theta\) that assigns positive probability to every \(\eta \in \Theta\). We observe \((X_i)_{i \in \mathbb{N}}\) sequentially, with \(X_1 \sim p_\theta\) and \(X_{j+1} \sim p_\theta(x | X^j)\), for all \(j \in \mathbb{N}\), for some fixed \(\theta \in \Theta\). The predictive distribution of \(x^j = (x_1, \ldots, x_j)\) is

()\[\begin{equation} m(x^j) := \int_\eta p_\eta(x^j) \pi(d\eta). \end{equation}\]

The posterior distribution of \(\theta\) given \(X^j=x^j\) is

()\[\begin{equation} \pi_j(\theta) = \pi(d\theta | X^j) := \frac{p_\theta(x) \pi(\theta)}{m(x^j)}. \end{equation}\]

Consider the sequence

()\[\begin{equation} Y_j(\theta) := \frac{\pi(\theta)}{\pi_j(\theta)}, \;\; j \in \mathbb{N}. \end{equation}\]

Then \(Y_j\) is nonnegative. Suppose the true value of \(\theta\) is \(\mu\) and that \(\pi\) assigns positive mass to \(\mu\). Then \((Y_j(\mu))\) is a nonnegative martingale with expected value 1.

Proof: [To do]

General betting martingales

Waudby-Smith, I. and A. Ramdas (2021)

()\[\begin{equation} Y_j(\eta) := \prod_{i=1}^j (1 + \lambda_i(\eta) ) (X_i-\eta) \end{equation}\]

with \(Y_0 := 1\), \(\lambda_i \in [-1/(1-\eta), 1/\eta]\) predictable.

Initial fortune of 1. Bet on the value of \(\theta_i = \mathbb{E}X_i\). Positive \(\lambda_i\) pays if \(X_i > \eta\); negative \(\lambda_i\) pays if \(X_i < \eta\). The constraint on \(\lambda\) ensures the gambler can’t go into debt: the fortune is nonnegative.

Waudby-Smith and Ramdas Proposition 3:
For any arbitrary set of (possibly unbounded) distributions \(\Theta\), every \(\theta\)-nonnegative martingale is necessarily of the form

()\[\begin{equation} Y_j = \prod_{i=1}^j (1 + \lambda_i Z_i) \end{equation}\]

for some predictable \(\lambda_i \le 1\) and some \(Z_i \ge -1\), \(\mathbb{P}_\theta\)-almost surely, and \(\mathbb{E}_{\theta}(Z_i | X^{i-1}) = 0\) for every \(\theta \in \Theta\). The same statement also holds for nonnegative S-supermartingales, with the = 0 replaced by ≤ 0.

Sampling with or without replacement from a finite population

Suppose we have a population \(\{x_i\}_{i=1}^N\). Let \(\theta = \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i\) be the population mean. Let \(X_k\) be the value of \(x_i\) selected on the \(k\)th draw. Let \(X^j := (X_1, \ldots, X_j)\). Assume \(X_j \in [0, u_j]\) for some known \(u_j\).

Let \(\theta_j := \mathbb{E}(X_j | X^{j-1})\). For sampling with replacement, \(\theta_j = \theta\). Define \(S_j := \sum_{i=1}^j X_j\). For sampling without replacement, \(\theta_j = (N\theta - S_{j-1})/(N-j)\): the population total was originally \(N\theta\), from which items totalling \(S_{j-1}\) have been removed, leaving \((N-j)\) items in the population.

Then \(\sum_{i=1}^j (X_i-\theta_i)\) and \(\prod_{i=1}^j X_i/\theta_i\) (provided \(\theta_i \ne 0\)) are martingales.

Properties of martingales, supermartingales, and submartingales

  • If \((X_j)_{j \in \mathbb{N}}\) is a martingale, then \(\mathbb{E} X_j = \mathbb{E} X_1\).

  • If \((X_j)_{j \in \mathbb{N}}\) is a supermartingale, then \(\mathbb{E} X_j \le \mathbb{E} X_1\).

  • If \((X_j)_{j \in \mathbb{N}}\) is a submartingale, then \(\mathbb{E} X_j \ge \mathbb{E} X_1\).

Proof. Suppose \((X_j)_{j \in \mathbb{N}}\) is a martingale w.r.t. \((Y_j)_{j \in \mathbb{N}}\). Then, by the law of total expectation,

()\[\begin{equation} \mathbb{E}X_j = \mathbb{E}\mathbb{E}(X_j | Y^{j-1}) = \mathbb{E} X_{j-1}. \end{equation}\]
()\[\begin{equation} \mathbb{E}X_{j-1} = \mathbb{E}\mathbb{E}(X_{j-1} | Y^{j-2}) = \mathbb{E} X_{j-2}. \end{equation}\]

Continuing, we end up with \(\mathbb{E}X_1\).

For supermartingales and submartingales, \(=\) is replaced with \(\le\) or \(\ge\), respectively. \(\Box\).

Stopping times

Let \(\tau\) be a measurable mapping from \(X_1, X_2, \ldots\) to \(\mathbb{N} \cup \infty\) and suppose that for \(j=1, 2, \ldots\), the event \(\tau = j\) depends only on \(X^j = (X_1, \ldots, X_j)\), and not on \(X_k\) for \(k>j\). (The usual definition of stopping times involves filtrations, which are beyond the scope of these materials.) Then \(\tau\) is a stopping time for \(X\).

In the example where \(X_j\) represents a sequence of wagers, some stopping rules are:

  • stop after 10 bets

  • stop when you go broke

  • stop when you double your initial stake

  • stop if you have won three bets in a row

These are not stopping rules:

  • stop when your fortune is as high as it will ever get

  • stop if you would lose the next three bets

If \(X_1, X_2, \ldots\) is a (super | sub) martingale and \(\tau\)

Doob’s Optional Stopping Theorem (1953)

Suppose \(\tau\) is a stopping time for the stochastic process \((X_j)_{j \in \mathbb{N}}\) and that any of the following conditions holds:

  • \(\mathbb{P} \{\tau \le c \} = 1\) for some \(c \in \mathbb{N}\)

  • \(\mathbb{E} \tau < \infty\) and \( | X_j - X_{j-1}| \le c\) for all \(j\) and some \(c\)

  • \(|X_{j}| \le c \) for all \(j \in \mathbb{N}\) for some \(c\) and \(\mathbb{P} \{ \tau < \infty \}=1\).

Then if \((X_j)_{j \in \mathbb{N}}\) is a martingale, \(\mathbb{E} X_\tau = \mathbb{E} X_1\); if it is a supermartingale, \(\mathbb{E} X_\tau \le \mathbb{E} X_1\).

Doob’s Martingale Convergence Theorem

Suppose \((X_j){j \in \mathbb{N}}\) is a martingale and \(\sup_{j \in \mathbb{N}} \mathbb{E} (-X_j \wedge 0) < \infty\). Then \((X_j)\) converges almost surely to a random variable \(X_\infty\) with \(\mathbb{E} X_\infty < \infty\).

Ville’s Inequality (1939)

Let \((X_j)_{j \in \mathbb{N}}\) be a nonnegative (super)martingale with respect to \((Y_j)_{j \in \mathbb{N}}\). Then

()\[\begin{equation} \mathbb{P} \{ \exists j : X_j \ge c \mathbb{E} X_1 \} \le 1/c. \end{equation}\]

Proof. (informal: can be made rigorous using Doob’s martingale convergence theorem)

Define the stopping time \(\tau(k)\) to be the first \(j \le \) such that \(X_j \ge x > 0\), or \(k\) otherwise. For each \(k\), \(X_{\tau(k)}\) is a stopping time, so \(\mathbb{E} X_{\tau(k)} = \mathbb{E} X_1\). Apply Markov’s inequality to \(X_{\tau(k)}\) and let \(k \rightarrow \infty\).

Supermartingale tests

Wald’s SPRT

See SPRT

The Empirical Bernstein martingale test

See Howard et al. (2021).

import math
import scipy as sp
from scipy.stats import bernoulli, uniform
import numpy as np
from numpy.testing import assert_allclose
import json
np.random.seed(123456789)
# !pip install rilacs
from rilacs.strategies import linear_gamma_dist
import pytest
from rilacs.martingales import (
    apriori_Kelly_martingale,
    distKelly_martingale,
    sqKelly_martingale,
    dKelly_martingale,
)
import itertools
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/var/folders/70/k4zn0z5907g_7ftxdq3dxm_h0000gn/T/ipykernel_48878/3478796715.py in <module>
      2 from rilacs.strategies import linear_gamma_dist
      3 import pytest
----> 4 from rilacs.martingales import (
      5     apriori_Kelly_martingale,
      6     distKelly_martingale,

/opt/anaconda3/lib/python3.8/site-packages/rilacs/martingales.py in <module>
      1 from typing import Callable
      2 import numpy as np
----> 3 from confseq.betting import (
      4     betting_mart,
      5     diversified_betting_mart,

/opt/anaconda3/lib/python3.8/site-packages/confseq/betting.py in <module>
      7 from copy import copy, deepcopy
      8 from logging import info
----> 9 from confseq.misc import get_running_intersection, get_ci_seq
     10 
     11 from confseq.predmix import lambda_predmix_eb

/opt/anaconda3/lib/python3.8/site-packages/confseq/misc.py in <module>
      1 import numpy as np
----> 2 from numpy.typing import NDArray
      3 from typing import Callable, Tuple
      4 import multiprocess
      5 

ImportError: cannot import name 'NDArray' from 'numpy.typing' (/opt/anaconda3/lib/python3.8/site-packages/numpy/typing/__init__.py)
def sprt_mart(x : np.array, N : int, mu : float=1/2, eta: float=1-np.finfo(float).eps, \
              u: float=1, random_order = True):
    '''
    Finds the p value for the hypothesis that the population 
    mean is less than or equal to mu against the alternative that it is eta,
    for a population of size N of values in the interval [0, u].
    
    Generalizes Wald's SPRT for the Bernoulli to sampling without replacement and to bounded
    values rather than binary values.

    If N is finite, assumes the sample is drawn without replacement
    If N is infinite, assumes the sample is with replacement
    
    Data are assumed to be in random order. If not, the calculation for sampling without replacement is incorrect.


    
    Parameters:
    -----------
    x : binary list, one element per draw. A list element is 1 if the 
        the corresponding trial was a success
    N : int
        population size for sampling without replacement, or np.infinity for 
        sampling with replacement
    theta : float in (0,u)
        hypothesized population mean
    eta : float in (0,u)
        alternative hypothesized population mean
    random_order : Boolean
        if the data are in random order, setting this to True can improve the power.
        If the data are not in random order, set to False
    '''
    if any((xx < 0 or xx > u) for xx in x):
        raise ValueError(f'Data out of range [0,{u}]')
    if np.isfinite(N):
        if not random_order:
            raise ValueError("data must be in random order for samples without replacement")
        S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
        j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
        m = (N*mu-S)/(N-j+1)                   # mean of population after (j-1)st draw, if null is true
    else:
        m = mu
    with np.errstate(divide='ignore',invalid='ignore'): 
        terms = np.cumprod((x*eta/m + (u-x)*(u-eta)/(u-m))/u) # generalization of Bernoulli SPRT
    terms[m<0] = np.inf                        # the null is surely false
    return terms
def shrink_trunc(x: np.array, N: int, mu: float=1/2, nu: float=1-np.finfo(float).eps, u: float=1, c: float=1/2, 
                 d: float=100) -> np.array: 
    '''
    apply the shrinkage and truncation estimator to an array
    
    sample mean is shrunk towards nu, with relative weight d compared to a single observation.
    estimate is truncated above at u-u*eps and below at mu_j+e_j(c,j)
    
    S_1 = 0
    S_j = \sum_{i=1}^{j-1} x_i, j > 1
    m_j = (N*mu-S_j)/(N-j+1) if np.isfinite(N) else mu
    e_j = c/sqrt(d+j-1)
    eta_j =  ( (d*nu + S_j)/(d+j-1) \vee (m_j+e_j) ) \wedge u*(1-eps)
    
    Parameters
    ----------
    x : np.array
        input data       
    mu : float in (0, 1)
        hypothesized population mean
    eta : float in (t, 1)
        initial alternative hypothethesized value for the population mean
    c : positive float
        scale factor for allowing the estimated mean to approach t from above
    d : positive float
        relative weight of nu compared to an observation, in updating the alternative for each term
    '''
    S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
    j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
    m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu   # mean of population after (j-1)st draw, if null is true 
    return np.minimum(u*(1-np.finfo(float).eps), np.maximum((d*nu+S)/(d+j-1),m+c/np.sqrt(d+j-1)))
def test_shrink_trunc():
    epsj = lambda c, d, j: c/math.sqrt(d+j-1)
    Sj = lambda x, j: 0 if j==1 else np.sum(x[0:j-1])
    muj = lambda N, mu, x, j: (N*mu - Sj(x, j))/(N-j+1) if np.isfinite(N) else mu
    nus = [.51, .55, .6]
    mu = 1/2
    u = 1
    d = 10
    vrand =  sp.stats.bernoulli.rvs(1/2, size=20)
    v = [
        np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]),
        np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
        vrand
    ]
    for nu in nus:
        c = (nu-mu)/2
        for x in v:
            N = len(x)
            xinf = shrink_trunc(x, np.inf, mu, nu, c=c, d=d)
            xfin = shrink_trunc(x, len(x), mu, nu, c=c, d=d)
            yinf = np.zeros(len(x))
            yfin = np.zeros(len(x))
            for j in range(1,len(x)+1):
                est = (d*nu + Sj(x,j))/(d+j-1)
                most = u*(1-np.finfo(float).eps)
                yinf[j-1] = np.minimum(np.maximum(mu+epsj(c,d,j), est), most)
                yfin[j-1] = np.minimum(np.maximum(muj(N,mu,x,j)+epsj(c,d,j), est), most)
            np.testing.assert_allclose(xinf, yinf)    
            np.testing.assert_allclose(xfin, yfin)    
    
test_shrink_trunc()
def alpha_mart(x: np.array, N: int, mu: float=1/2, eta: float=1-np.finfo(float).eps, u: float=1, \
               estim: callable=shrink_trunc) -> np.array :
    '''
    Finds the ALPHA martingale for the hypothesis that the population 
    mean is less than or equal to t using a martingale method,
    for a population of size N, based on a series of draws x.
    
    The draws must be in random order, or the sequence is not a martingale under the null
    
    If N is finite, assumes the sample is drawn without replacement
    If N is infinite, assumes the sample is with replacement

    Parameters
    ----------
    x : list corresponding to the data
    N : int
        population size for sampling without replacement, or np.infinity for sampling with replacement
    mu : float in (0,1)
        hypothesized fraction of ones in the population
    eta : float in (t,1) 
        alternative hypothesized population mean
    estim : callable
        estim(x, N, mu, eta, u) -> np.array of length len(x), the sequence of values of eta_j for ALPHA
               
    Returns
    -------   
    terms : array
        sequence of terms that would be a nonnegative martingale under the null
    '''
    S = np.insert(np.cumsum(x),0,0)[0:-1]  # 0, x_1, x_1+x_2, ...,  
    j = np.arange(1,len(x)+1)              # 1, 2, 3, ..., len(x)
    m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu   # mean of population after (j-1)st draw, if null is true 
    etaj = estim(x, N, mu, eta, u) 
    with np.errstate(divide='ignore',invalid='ignore'):
        terms = np.cumprod((x*etaj/m + (1-x)*(u-etaj)/(u-m))/u)
    terms[m<0] = np.inf
    return terms