SPRT with a nuisance parameter¶

We have a population of \(N\) objects of which \(N_w\) are labeled “w,” \(N_\ell\) are labeled \(l\), and \(N_u \equiv N - N_w - N_\ell \ge 0\) are labeled “u.”

We want to test the hypothesis \(H_0: N_w - N_\ell \le c\).

This might arise in auditing an election, for instance; then, \(N_w\) might be the number of ballots with a vote for candidate \(w\), \(N_\ell\) the number of ballots with a vote for candidate \(\ell\), and \(N_u\) the number of ballots with votes for other candidates, or ballots with undervotes, overvotes, or that are otherwise invalid.

Let \(H_k: N_w - N_\ell \le c, N_u = k\) be the hypothesis that \(N_w - N_\ell \le c\) and \(N_u = k\).

Then \(H_0 = \cup_{0 \le k \le N-c} H_k\).

If we can reject every \(H_k\), \(0 \le k \le N-c\), we can reject \(H_0\). If we test each \(H_k\) at level \(\alpha\), then the overall test has level no larger than \(\alpha\): at most one \(H_k\) can be right, and the chance of erroneously rejecting it is at most \(\alpha\).

We do not have to test every \(H_k\) against the same alternative.

Let us consider testing \(H_k\) against \(H_{kd}: N_w - N_\ell = c+d_k, N_u = k\). The hypotheses are equivalent to

\[\begin{equation*} H_k: N_u = k, N_w \le (N-k+c)/2, N_\ell \ge (N-k-c)/2 \end{equation*}\]

and

\[\begin{equation*} H_{kd}: N_u = k, N_w = (N-k+c+d_k)/2, N_\ell = (N-k-c-d_k)/2. \end{equation*}\]

Suppose we draw a sample sequentially without replacement, and that in \(n\) draws we see \(W_n\) items labeled “w” and \(L_n\) items labeled “l,” and let \(U_n \equiv n - W_n - L_n\).

Need a condition on \(W_n\), \(L_n\) to be able to conclude that the ratio is maximized when the inequalities are at the integer boundary of the domain. If that isn’t true, there’s another layer of nuisance parameters to deal with.

Define \(\prod_\emptyset (\cdot) \equiv 1\). Let \(N_{wk} \equiv \lfloor (N-k+c)/2 \rfloor\), \(N_{\ell k} \equiv \lceil (N-k-c)/2 \rceil\), \(N_{wkd} \equiv \lfloor (N-k+c+d_k)/2 \rfloor\), and \(N_{\ell kd} \equiv \lceil (N-k-c-d_k)/2 \rceil\).

The likelihood ratio for \(H_{kd}\) to \(H_k\) to is

\[\begin{equation*} \mathcal{L}_{k,n} \equiv \frac{\prod_{i=0}^{W_n} (N_{wkd} - i) \prod_{i=0}^{L_n} (N_{\ell kd} - i) \prod_{i=0}^{U_n} (k - i)}{\prod_{i=0}^{W_n} (N_{wk} - i) \prod_{i=0}^{L_n} (N_{\ell k} - i) \prod_{i=0}^{U_n} (k - i)} = \frac{\prod_{i=0}^{W_n} (N_{wkd} - i) \prod_{i=0}^{L_n} (N_{\ell kd} - i)}{\prod_{i=0}^{W_n} (N_{wk} - i) \prod_{i=0}^{L_n} (N_{\ell k} - i)}. \end{equation*}\]

This might need additional restrictions in case a value of \(c\), \(d\), or \(k\) is impossible given \(n\), \(W_n\), \(L_n\).

If \(\max_{0 \le k \le N-c} \mathcal{L}_{k,n} \ge 1/\alpha\), we can reject \(H_0\).

In an election, we might pick \(d = V_w - V_\ell - c \), where \(V_w\) are the reported votes for \(w\) and \(V_\ell\) are the reported votes for \(\ell\). We could take all \(d_k\) equal, or relate \(d_k\) to \(V_w/(V_w+V_\ell)\) and the hypothesized value of \(N_w + N_\ell\), or something else.

Collaborative and Reproducible Data Science

SPRT with a nuisance parameter

SPRT with a nuisance parameter¶