Supermartingale-based Tests
Contents
Supermartingale-based Tests¶
Martingales, supermartingales, submartingales¶
Let \((X_j)_{j \in \mathbb{N}} = X_1, X_2, \ldots\) and \((Y_j)_{j \in \mathbb{N}} = Y_1, Y_2, \ldots\) be sequences of random variables (stochastic processes). Let \(Y^{i} := (Y_1, \ldots, Y_i)\) be the first \(i\) elements of the sequence \((Y_j)_{j \in \mathbb{N}}\). Suppose \(\mathbb{E}|X_j| < \infty\) for all \(j \in \mathbb{N}\).
Then \((X_j)_{j \in \mathbb{N}}\) is a martingale with respect to \((Y_j)_{j \in \mathbb{N}}\) if
It is a supermartingale if
and a submartingale if
A martingale, supermartingale, or submartingale is nonnegative if \(\mathbb{P} \{X_j \ge 0 \} = 1\) for all \(j \in \mathbb{N}\).
Terminology: A sequence \((\lambda_i)_{i \in \mathbb{N}}\) is predictable with respect to \((Y_j)_{j \in \mathbb{N}}\) if \(\lambda_j\) does not depend on \((Y_i)_{i>j}\).
Examples¶
Product of independent random variables with expected value 1.¶
Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 1. Define \(X_j := \prod_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then
Note that if \((Z_i)\) are independent random variables, then \(\prod_{i=1}^j Z_i/\mathbb{E}Z_i\) is a martingale.
Sum of independent random variables with expected value 0.¶
Suppose \((Y_j)_{j \in \mathbb{N}}\) are independent and have expected value 0. Define \(X_j := \sum_{i=1}^j Y_j\), \(j \in \mathbb{N}\). Then
Gambler’s fortune in repeated bets on independent fair coin tosses.¶
Suppose we bet on the outcomes of a sequence of independent fair coin tosses \(Y_1, Y_2, \ldots\), i.e., \(\{Y_j\}_{j \in \mathbb{N}}\) are IID \(\mbox{Bernoulli}(1/2)\). Our initial fortune \(X_0\); our fortune after the \(j\)th wager is \(X_j\). We are not allowed to wager more than our current fortune on the next bet. If \(X_i\) is zero then \(X_k = 0\) for all \(k > i\): we can’t bet if we go broke. The wager on the \(j\) toss is \(b_j \in [0, X_{j-1}]\); \(b_j\) may depend on \(Y^{j-1}\). If \(Y_j=1\), we win the \(j\)th bet and our fortune increases by \(b_j\); if \(Y_j=0\), we lose the \(j\)th bet and our fortune decreases by \(b_j\). Then \(X_j = X_{j-1}+b_jY_j - b_j(1-Y_j)\).
Thus the fortune \((X_j)_{j \in \mathbb{N}}\) is a nonnegative martingale with respect to the Bernoulli coin tosses \((Y_j)_{j \in \mathbb{N}}\).
Likelihood ratios¶
We have a family \(\mathcal{P} := \{ \mathbb{P}_\theta : \theta \in \Theta \}\) of probability distributions on a measurable space \(\mathcal{X}\), dominated by a common \(\sigma\)-finite measure \(\mu\). Let \(p_\theta(x)\), \(x \in \mathcal{X}\) denote the density of \(P_\theta\) with respect to \(\mu\). For fixed \(x \in \mathcal{X}\), the function
is the likelihood. If we observe \(X=x\), then for \(\eta \in \Theta\), \(\mathcal{L}(\eta) = \mathcal{L}(\eta | X=x)\) is the _likelihood of \(\eta\) given \(X=x\). The likelihood ratio of \(\eta\) to \(\nu\) given \(X=x\) is \(\mathcal{L}(\eta | X=x)/\mathcal{L}(\nu | X=x)\).
Suppose \(\mathbb{P}_1\) and \(\mathbb{P}_2\) are distributions on some outcome space and that they have densities with respect to some dominating measure \(\mu\). Let \(f\) and \(g\) denote those densities. Suppose \((Y_j)_{j \in \mathbb{N}}\) are IID with density \(f\). Define the likelihood ratio process
Now
since \(g\) is a probability density with respect to \(\mu\).
Prior-posterior ratio martingale¶
See Waudby-Smith and Ramdas (2021).
Again, we have a family of distributions Let \(\Theta\) be countable and suppose \(\{\mathbb{P}_\theta : \theta \in \Theta\}\) are dominated by a common distribution \(\mu\), where \(p_\theta\) is the density of \(\mathbb{P}_\theta\) with respect to \(\mu\). Suppose \(\Theta\) is measurable and let \(\pi\) (the prior) be a probability distribution on \(\Theta\) that assigns positive probability to every \(\eta \in \Theta\). We observe \((X_i)_{i \in \mathbb{N}}\) sequentially, with \(X_1 \sim p_\theta\) and \(X_{j+1} \sim p_\theta(x | X^j)\), for all \(j \in \mathbb{N}\), for some fixed \(\theta \in \Theta\). The predictive distribution of \(x^j = (x_1, \ldots, x_j)\) is
The posterior distribution of \(\theta\) given \(X^j=x^j\) is
Consider the sequence
Then \(Y_j\) is nonnegative. Suppose the true value of \(\theta\) is \(\mu\) and that \(\pi\) assigns positive mass to \(\mu\). Then \((Y_j(\mu))\) is a nonnegative martingale with expected value 1.
Proof: [To do]
General betting martingales¶
Waudby-Smith, I. and A. Ramdas (2021)
with \(Y_0 := 1\), \(\lambda_i \in [-1/(1-\eta), 1/\eta]\) predictable.
Initial fortune of 1. Bet on the value of \(\theta_i = \mathbb{E}X_i\). Positive \(\lambda_i\) pays if \(X_i > \eta\); negative \(\lambda_i\) pays if \(X_i < \eta\). The constraint on \(\lambda\) ensures the gambler can’t go into debt: the fortune is nonnegative.
Waudby-Smith and Ramdas Proposition 3:
For any arbitrary set of (possibly unbounded) distributions \(\Theta\), every
\(\theta\)-nonnegative martingale is necessarily of the form
for some predictable \(\lambda_i \le 1\) and some \(Z_i \ge -1\), \(\mathbb{P}_\theta\)-almost surely, and \(\mathbb{E}_{\theta}(Z_i | X^{i-1}) = 0\) for every \(\theta \in \Theta\). The same statement also holds for nonnegative S-supermartingales, with the = 0 replaced by ≤ 0.
The empirical Bernstein martingale¶
Sampling with or without replacement from a finite population¶
Suppose we have a population \(\{x_i\}_{i=1}^N\). Let \(\theta = \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i\) be the population mean. Let \(X_k\) be the value of \(x_i\) selected on the \(k\)th draw. Let \(X^j := (X_1, \ldots, X_j)\). Assume \(X_j \in [0, u_j]\) for some known \(u_j\).
Let \(\theta_j := \mathbb{E}(X_j | X^{j-1})\). For sampling with replacement, \(\theta_j = \theta\). Define \(S_j := \sum_{i=1}^j X_j\). For sampling without replacement, \(\theta_j = (N\theta - S_{j-1})/(N-j)\): the population total was originally \(N\theta\), from which items totalling \(S_{j-1}\) have been removed, leaving \((N-j)\) items in the population.
Then \(\sum_{i=1}^j (X_i-\theta_i)\) and \(\prod_{i=1}^j X_i/\theta_i\) (provided \(\theta_i \ne 0\)) are martingales.
Properties of martingales, supermartingales, and submartingales¶
If \((X_j)_{j \in \mathbb{N}}\) is a martingale, then \(\mathbb{E} X_j = \mathbb{E} X_1\).
If \((X_j)_{j \in \mathbb{N}}\) is a supermartingale, then \(\mathbb{E} X_j \le \mathbb{E} X_1\).
If \((X_j)_{j \in \mathbb{N}}\) is a submartingale, then \(\mathbb{E} X_j \ge \mathbb{E} X_1\).
Proof. Suppose \((X_j)_{j \in \mathbb{N}}\) is a martingale w.r.t. \((Y_j)_{j \in \mathbb{N}}\). Then, by the law of total expectation,
Continuing, we end up with \(\mathbb{E}X_1\).
For supermartingales and submartingales, \(=\) is replaced with \(\le\) or \(\ge\), respectively. \(\Box\).
Stopping times¶
Let \(\tau\) be a measurable mapping from \(X_1, X_2, \ldots\) to \(\mathbb{N} \cup \infty\) and suppose that for \(j=1, 2, \ldots\), the event \(\tau = j\) depends only on \(X^j = (X_1, \ldots, X_j)\), and not on \(X_k\) for \(k>j\). (The usual definition of stopping times involves filtrations, which are beyond the scope of these materials.) Then \(\tau\) is a stopping time for \(X\).
In the example where \(X_j\) represents a sequence of wagers, some stopping rules are:
stop after 10 bets
stop when you go broke
stop when you double your initial stake
stop if you have won three bets in a row
These are not stopping rules:
stop when your fortune is as high as it will ever get
stop if you would lose the next three bets
If \(X_1, X_2, \ldots\) is a (super | sub) martingale and \(\tau\)
Doob’s Optional Stopping Theorem (1953)¶
Suppose \(\tau\) is a stopping time for the stochastic process \((X_j)_{j \in \mathbb{N}}\) and that any of the following conditions holds:
\(\mathbb{P} \{\tau \le c \} = 1\) for some \(c \in \mathbb{N}\)
\(\mathbb{E} \tau < \infty\) and \( | X_j - X_{j-1}| \le c\) for all \(j\) and some \(c\)
\(|X_{j}| \le c \) for all \(j \in \mathbb{N}\) for some \(c\) and \(\mathbb{P} \{ \tau < \infty \}=1\).
Then if \((X_j)_{j \in \mathbb{N}}\) is a martingale, \(\mathbb{E} X_\tau = \mathbb{E} X_1\); if it is a supermartingale, \(\mathbb{E} X_\tau \le \mathbb{E} X_1\).
Doob’s Martingale Convergence Theorem¶
Suppose \((X_j){j \in \mathbb{N}}\) is a martingale and \(\sup_{j \in \mathbb{N}} \mathbb{E} (-X_j \wedge 0) < \infty\). Then \((X_j)\) converges almost surely to a random variable \(X_\infty\) with \(\mathbb{E} X_\infty < \infty\).
Ville’s Inequality (1939)¶
Let \((X_j)_{j \in \mathbb{N}}\) be a nonnegative (super)martingale with respect to \((Y_j)_{j \in \mathbb{N}}\). Then
Proof. (informal: can be made rigorous using Doob’s martingale convergence theorem)
Define the stopping time \(\tau(k)\) to be the first \(j \le \) such that \(X_j \ge x > 0\), or \(k\) otherwise. For each \(k\), \(X_{\tau(k)}\) is a stopping time, so \(\mathbb{E} X_{\tau(k)} = \mathbb{E} X_1\). Apply Markov’s inequality to \(X_{\tau(k)}\) and let \(k \rightarrow \infty\).
Supermartingale tests¶
The Empirical Bernstein martingale test¶
See Howard et al. (2021).
import math
import scipy as sp
from scipy.stats import bernoulli, uniform
import numpy as np
from numpy.testing import assert_allclose
import json
np.random.seed(123456789)
# !pip install rilacs
from rilacs.strategies import linear_gamma_dist
import pytest
from rilacs.martingales import (
apriori_Kelly_martingale,
distKelly_martingale,
sqKelly_martingale,
dKelly_martingale,
)
import itertools
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/var/folders/70/k4zn0z5907g_7ftxdq3dxm_h0000gn/T/ipykernel_48878/3478796715.py in <module>
2 from rilacs.strategies import linear_gamma_dist
3 import pytest
----> 4 from rilacs.martingales import (
5 apriori_Kelly_martingale,
6 distKelly_martingale,
/opt/anaconda3/lib/python3.8/site-packages/rilacs/martingales.py in <module>
1 from typing import Callable
2 import numpy as np
----> 3 from confseq.betting import (
4 betting_mart,
5 diversified_betting_mart,
/opt/anaconda3/lib/python3.8/site-packages/confseq/betting.py in <module>
7 from copy import copy, deepcopy
8 from logging import info
----> 9 from confseq.misc import get_running_intersection, get_ci_seq
10
11 from confseq.predmix import lambda_predmix_eb
/opt/anaconda3/lib/python3.8/site-packages/confseq/misc.py in <module>
1 import numpy as np
----> 2 from numpy.typing import NDArray
3 from typing import Callable, Tuple
4 import multiprocess
5
ImportError: cannot import name 'NDArray' from 'numpy.typing' (/opt/anaconda3/lib/python3.8/site-packages/numpy/typing/__init__.py)
def sprt_mart(x : np.array, N : int, mu : float=1/2, eta: float=1-np.finfo(float).eps, \
u: float=1, random_order = True):
'''
Finds the p value for the hypothesis that the population
mean is less than or equal to mu against the alternative that it is eta,
for a population of size N of values in the interval [0, u].
Generalizes Wald's SPRT for the Bernoulli to sampling without replacement and to bounded
values rather than binary values.
If N is finite, assumes the sample is drawn without replacement
If N is infinite, assumes the sample is with replacement
Data are assumed to be in random order. If not, the calculation for sampling without replacement is incorrect.
Parameters:
-----------
x : binary list, one element per draw. A list element is 1 if the
the corresponding trial was a success
N : int
population size for sampling without replacement, or np.infinity for
sampling with replacement
theta : float in (0,u)
hypothesized population mean
eta : float in (0,u)
alternative hypothesized population mean
random_order : Boolean
if the data are in random order, setting this to True can improve the power.
If the data are not in random order, set to False
'''
if any((xx < 0 or xx > u) for xx in x):
raise ValueError(f'Data out of range [0,{u}]')
if np.isfinite(N):
if not random_order:
raise ValueError("data must be in random order for samples without replacement")
S = np.insert(np.cumsum(x),0,0)[0:-1] # 0, x_1, x_1+x_2, ...,
j = np.arange(1,len(x)+1) # 1, 2, 3, ..., len(x)
m = (N*mu-S)/(N-j+1) # mean of population after (j-1)st draw, if null is true
else:
m = mu
with np.errstate(divide='ignore',invalid='ignore'):
terms = np.cumprod((x*eta/m + (u-x)*(u-eta)/(u-m))/u) # generalization of Bernoulli SPRT
terms[m<0] = np.inf # the null is surely false
return terms
def shrink_trunc(x: np.array, N: int, mu: float=1/2, nu: float=1-np.finfo(float).eps, u: float=1, c: float=1/2,
d: float=100) -> np.array:
'''
apply the shrinkage and truncation estimator to an array
sample mean is shrunk towards nu, with relative weight d compared to a single observation.
estimate is truncated above at u-u*eps and below at mu_j+e_j(c,j)
S_1 = 0
S_j = \sum_{i=1}^{j-1} x_i, j > 1
m_j = (N*mu-S_j)/(N-j+1) if np.isfinite(N) else mu
e_j = c/sqrt(d+j-1)
eta_j = ( (d*nu + S_j)/(d+j-1) \vee (m_j+e_j) ) \wedge u*(1-eps)
Parameters
----------
x : np.array
input data
mu : float in (0, 1)
hypothesized population mean
eta : float in (t, 1)
initial alternative hypothethesized value for the population mean
c : positive float
scale factor for allowing the estimated mean to approach t from above
d : positive float
relative weight of nu compared to an observation, in updating the alternative for each term
'''
S = np.insert(np.cumsum(x),0,0)[0:-1] # 0, x_1, x_1+x_2, ...,
j = np.arange(1,len(x)+1) # 1, 2, 3, ..., len(x)
m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu # mean of population after (j-1)st draw, if null is true
return np.minimum(u*(1-np.finfo(float).eps), np.maximum((d*nu+S)/(d+j-1),m+c/np.sqrt(d+j-1)))
def test_shrink_trunc():
epsj = lambda c, d, j: c/math.sqrt(d+j-1)
Sj = lambda x, j: 0 if j==1 else np.sum(x[0:j-1])
muj = lambda N, mu, x, j: (N*mu - Sj(x, j))/(N-j+1) if np.isfinite(N) else mu
nus = [.51, .55, .6]
mu = 1/2
u = 1
d = 10
vrand = sp.stats.bernoulli.rvs(1/2, size=20)
v = [
np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]),
np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0]),
vrand
]
for nu in nus:
c = (nu-mu)/2
for x in v:
N = len(x)
xinf = shrink_trunc(x, np.inf, mu, nu, c=c, d=d)
xfin = shrink_trunc(x, len(x), mu, nu, c=c, d=d)
yinf = np.zeros(len(x))
yfin = np.zeros(len(x))
for j in range(1,len(x)+1):
est = (d*nu + Sj(x,j))/(d+j-1)
most = u*(1-np.finfo(float).eps)
yinf[j-1] = np.minimum(np.maximum(mu+epsj(c,d,j), est), most)
yfin[j-1] = np.minimum(np.maximum(muj(N,mu,x,j)+epsj(c,d,j), est), most)
np.testing.assert_allclose(xinf, yinf)
np.testing.assert_allclose(xfin, yfin)
test_shrink_trunc()
def alpha_mart(x: np.array, N: int, mu: float=1/2, eta: float=1-np.finfo(float).eps, u: float=1, \
estim: callable=shrink_trunc) -> np.array :
'''
Finds the ALPHA martingale for the hypothesis that the population
mean is less than or equal to t using a martingale method,
for a population of size N, based on a series of draws x.
The draws must be in random order, or the sequence is not a martingale under the null
If N is finite, assumes the sample is drawn without replacement
If N is infinite, assumes the sample is with replacement
Parameters
----------
x : list corresponding to the data
N : int
population size for sampling without replacement, or np.infinity for sampling with replacement
mu : float in (0,1)
hypothesized fraction of ones in the population
eta : float in (t,1)
alternative hypothesized population mean
estim : callable
estim(x, N, mu, eta, u) -> np.array of length len(x), the sequence of values of eta_j for ALPHA
Returns
-------
terms : array
sequence of terms that would be a nonnegative martingale under the null
'''
S = np.insert(np.cumsum(x),0,0)[0:-1] # 0, x_1, x_1+x_2, ...,
j = np.arange(1,len(x)+1) # 1, 2, 3, ..., len(x)
m = (N*mu-S)/(N-j+1) if np.isfinite(N) else mu # mean of population after (j-1)st draw, if null is true
etaj = estim(x, N, mu, eta, u)
with np.errstate(divide='ignore',invalid='ignore'):
terms = np.cumprod((x*etaj/m + (1-x)*(u-etaj)/(u-m))/u)
terms[m<0] = np.inf
return terms