ALPHA: Audit that Learns from Previously Hand-Audited Ballots

See [ALPHA: Audit that Learns from Previously Hand-Audited Ballots] (https://arxiv.org/abs/2201.02707)

Consider draws from a population \(\{x_i\}_{i=1}^N\), where each \(x_i \in [0, u]\).

Let \(\theta = \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i\) be the population mean. Let \(X_k\) be the value of \(x_i\) selected on the \(k\)th draw.

We start by developing a one-sided test of the hypothesis \(\theta = \mu\), then show the \(P\)-value is monotone in \(\mu\), so the test is valid for the hypothesis \(\theta \le \mu\).

Let \(X^j := (X_1, \ldots, X_j)\). Assume \(X_j \in [0, u_j]\) for some known \(u_j\).

Let \(\mu_j := \mathbb{E}(X_j | X^{j-1})\) computed under the null hypothesis \(\theta = \mu\). Let \(\eta_j = \eta_j(X^{j-1}) \in (\mu_j, u_j]\), \(j=1, \ldots\), be a \emph{predictable sequence} in the sense that \(\eta_j\) may depend on \(X^{j-1}\), but not on \(X_k\) for \(k \ge j\). We now define the ALPHA martingale \((T_j)\). Let \(T_0 := 1\) and

()\[\begin{equation} T_j := T_{j-1} u_j^{-1}\left ( X_j\frac{\eta_j}{\mu_j} + (u_j-X_j) \frac{u_j-\eta_j}{u_j-\mu_j} \right ), \;\; j=1, \ldots . \end{equation}\]

This can be rearranged to yield

()\[\begin{equation} T_j := T_{j-1} \left ( \frac{X_j}{\mu_j} \cdot \frac{\eta_j-\mu_j}{u_j-\mu_j} + \frac{u_j-\eta_j}{u_j-\mu_j} \right ). \label{eq:alphaMult} \end{equation}\]

Equivalently,

()\[\begin{equation} T_j := \prod_{i=1}^j \left ( \frac{X_i}{\mu_i} \cdot \frac{\eta_i-\mu_i}{u_i-\mu_i} + \frac{u_i-\eta_i}{u_i-\mu_i} \right ), \;\; j \ge 1. \label{eq:alphaMultProd} \end{equation}\]

Under the null hypothesis, \(\theta_j = \mu_j\) and \(T_j\) is nonnegative since \(X_j\), \(\mu_j\), and \(\eta_j\) are all in \([0, u_j]\). Also,

()\[\begin{eqnarray} \mathbb{E} (T_j | X^{j-1} ) &=& T_{j-1} \left ( \frac{\mu_j}{\mu_j} \cdot \frac{\eta_j-\mu_j}{u_j-\mu_j} + \frac{u_j-\eta_j}{u_j-\mu_j} \right ) \nonumber \\ &=& T_{j-1} \left ( \frac{\eta_j-\mu_j}{u_j-\mu_j} + \frac{u_j-\eta_j}{u_j-\mu_j} \right ) \nonumber \\ &=& T_{j-1}. \end{eqnarray}\]

Thus if \(\theta = \mu\), \((T_j)\) is a nonnegative martingale with respect to \((X_j)\), starting at \(1\).

If \(\theta < \mu\), then \(\mathbb{E} (X_j | X^{j-1}) < \mu_j\) and \(r_j = \frac{\mathbb{E} (X_j | X^{j-1})}{\mu_j} < 1\), so

()\[\begin{equation} \label{eq:supermartingale} \mathbb{E} (T_j | X^{j-1} ) = T_{j-1} \left ( r_j \cdot \frac{\eta_j-\mu_j}{u_j-\mu_j} + \frac{u_j-\eta_j}{u_j-\mu_j} \right ) < T_{j-1}. \end{equation}\]

Thus \((T_j)\) is a nonnegative supermartingale starting at 1 if \(\theta \le \mu\).

It follows from Ville’s inequality (1939) that if \(\theta \le \mu\),

()\[\begin{equation} \label{eq:p-value-adapt} \Pr \{ \exists j : T_j \ge \alpha^{-1} \} \le \alpha. \end{equation}\]

That is, \(\min(1, 1/T_j)\) is an “anytime \(P\)-value” for the null hypothesis \(\theta = \mu\).

The derivation did not use any information about \(\{x_i\}\) other than \(X_j \in [0, u_j]\): it applies to populations \(\{x_i\}\) that are nonnegative and bounded, not merely binary populations, to sampling with or without replacement, and to weighted sampling.

These martingales comprise the same family of betting martingales studied by Waudby-Smith and Ramdas (2021) and Waudby-Smith et al. (2021), but are parametrized differently.

Sampling without replacement

For tests about the mean of a finite population from a sample drawn without replacement, we need \(\mathbb{E}(X_j | X^{j-1})\) computed on the assumption that \(\theta = \frac{1}{N} \sum_{i=1}^N x_i = \mu\).

For sampling without replacement from a population with mean \(\mu\), after draw \(j-1\), the mean of the remaining numbers is \((N\mu - \sum_{k=1}^{j-1}X_k)/(N-j+1)\).

Thus the conditional expectation of \(X_j\) given \(X^{j-1}\) under the null is \((N\mu - \sum_{k=1}^{j-1}X_k)/(N-j+1)\). If \(N\mu - \sum_{k=1}^{j-1}X_k < 0\) for any \(k\), the null hypothesis \(\theta = \mu\) is certainly false.

Waudby-Smith and Ramdas (2021 and Waudby-Smith et al. (2021) develop tests and confidence sequences for the mean of a bounded population using “betting” martingales of the form

()\[\begin{equation} \label{eq:lambda-rilacs} M_j := \prod_{i=1}^j (1 + \lambda_i (X_i- \mu_i)), \end{equation}\]

where, as above, \(\mu_i := \mathbb{E}(X_i | X_{i-1})\) computed on the assumption that the null hypothesis is true. The sequence \((M_j)\) can be viewed as the fortune of a gambler in a series of wagers. The gambler starts with a stake of \(1\) unit and bets a fraction \(\lambda_i\) of their current wealth on the outcome of the \(i\)th wager. The value \(M_j\) is the gambler’s wealth after the \(j\)th wager. The gambler is not permitted to borrow money, so to ensure that when \(X_i = 0\) (corresponding to losing the \(i\)th bet) the gambler does not end up in debt (\(M_i < 0\)), \(\lambda_i\) cannot exceed \(1/\mu_i\).

The ALPHA martingale is of the same form:

()\[\begin{eqnarray} \label{eq:lambda-form} T_j &=& \prod_{i=1}^j \left ( \frac{X_i}{\mu_i} \cdot \frac{\eta_i-\mu_i}{u_i-\mu_i} + \frac{u_i-\eta_i}{u_i-\mu_i} \right ) \nonumber \\ &=& \prod_{i=1}^j \frac{X_i (\eta_i/\mu_i -1) + u_i - \eta_i}{u_i-\mu_i} \nonumber \\ &=& \prod_{i=1}^j \left ( 1 + \frac{X_i (\eta_i/\mu_i -1) + \mu_i - \eta_i}{u_i-\mu_i} \right ) \nonumber \\ &=& \prod_{i=1}^j \left ( 1 + \frac{\eta_i/\mu_i -1}{u_i-\mu_i} \cdot (X_i - \mu_i) \right ), \end{eqnarray}\]

identifying \(\lambda_i \equiv \frac{\eta_i/\mu_i -1}{u_i-\mu_i}\). Choosing \(\lambda_i\) is equivalent to choosing \(\eta_i\):

()\[\begin{equation} \lambda_i = \frac{\eta_i/\mu_i -1}{u_i-\mu_i} \;\; \Longleftrightarrow \;\; \eta_i = \mu_i \left ( 1 + \lambda_i (u_i-\mu_i) \right ). \end{equation}\]

As \(\eta_i\) ranges from \(\mu_i\) to \(u_i\), \(\frac{\eta_i/\mu_i -1}{u_i-\mu_i}\) ranges continuously from 0 to \(1/\mu_i\), the same range of values of \(\lambda_i\) permitted in Waudby-Smith et al.: selecting \(\lambda_i\) is equivalent to selecting a method for estimating \(\theta_i\).