Tests and Confidence Sets when there are Nuisance Parameters
Contents
Tests and Confidence Sets when there are Nuisance Parameters¶
Work in Progress!
Many statistical problems involve nuisance parameters, parameters that are not of scientific interest but that affect the probability distribution of the data.
Testing hypotheses and constructing confidence sets for some parameters when there are additional nuisance parameters can be complicated.
This chapter addresses two statistical inference problems involving nuisance parameters: tests and confidence bounds for a population mean from a stratified random sample, and tests and confidence bounds for the difference between the values of \(p\) for two independent binomial distributions.
One strategy that is widely applicable to testing hypotheses about some parameter when there are nuisance parameters is to find the maximum \(P\)-value over all possible values of the nuisance parameter, keeping the parameters of interest fixed.
Difference between two Binomial \(p\)s.¶
Suppose we observe \(X_1 \sim \mbox{Binom}(n_1, p_1)\) and \(X_2 \sim \mbox{Binom}(n_2, p_2)\) where \(X_1\) and \(X_2\) are independent. How can we make inferences about \(p_1-p_2\)?
We will develop several approaches.
Fisher’s exact test: a conditional test¶
Suppose we are only interested in the hypothesis \(p_1 = p_2\). One way to test that hypothesis is to condition on the value of \(X = X_1 + X_2\). If \(p_1 = p_2 = p\), then \(X_1 + X_2 \sim \mbox{Binom}(n_1+n_2, p)\). We can think of the data as arising from \(n_1+n_2\) iid Bernoulli\((p)\) trials. If that were the case, then, given the total, \(X\), any \(X\) of the \(n_1+n_2\) trials are equally likely to be the trials that resulted in “1” rather than “0.” (The trials are exchangeable: the joint distribution of the \(n_1+n_2\) trials is invariant under permutations of the labels.)
That is, conditional on \(X=x\), the distribution of \(X_1\) is hypergeometric with parameters \(N=n_1+n_2\), \(G=x\), and \(n=n_1\). We can use that to construct a conditional test of the hypothesis \(p_1=p_2\). As noted in the chapter on tests, if we always test conditionally at significance level not greater than \(\alpha\), whatever \(X\) happens to be, that yields a test that has unconditional level not greater than \(\alpha\).
Fisher’s exact test avoids the problem of estimating nuisance parameters by conditioning on \(X_1+X_2\). If \(p_1=p_2\) is big, \(X\) will tend to be large; if \(p_1=p_2\) is small, \(X\) will tend to be small. Regardless, the distribution of \(X_1\) given \(X_1+X_2\) is hypergeometric with known parameters.
While Fisher’s exact test provides a straightforward way to test the hypothesis \(p_1 = p_2\), it does not allow us to test the hypothesis \(p_1-p_1 = c \ne 0\) without additional assumptions, and thus does not allow us to make a confidence set for \(p_1-p_2\).
Bonferroni and Šidák simultaneous tests and confidence intervals¶
One easy but inefficient way to test hypotheses or make confidence intervals for \(p_1-p_2\) is to make simultaneous inferences about \(p_1\) and \(p_2\). For instance, suppose that \(\mathcal{I}_1(\cdot)\) is a \(1-\alpha_1\) confidence interval procedure for \(p_1\) and \(\mathcal{I}_2(\cdot)\) is a \(1-\alpha_2\) confidence interval procedure for \(p_2\).
By Bonferroni’s inequality,
Because \(X_1\) and \(X_2\) are independent, we can get a sharper result using Šidák’s adjustment:
To see that Šidák’s inequality is sharper (for independent intervals), calculate
Whenever \(\mathcal{I}_1(X_1) \ni p_1\) and \(\mathcal{I}_2(X_2) \ni p_2\),
so
That is, \(\mathcal{I}_1(X_1) - \mathcal{I}_2(X_2)\) is a \((1 - \alpha_1)(1 - \alpha_2)\) confidence interval for \(p_1-p_2\). Because of the duality between testing hypotheses and confidence intervals, we can reject the hypothesis that \(p_1-p_2 = c\) at significance level \(1- (1 - \alpha_1)(1 - \alpha_2)\) if \(c \notin \mathcal{I}_1(X_1) - \mathcal{I}_2(X_2)\). In particular, we can reject the hypothesis \(p_1=p_2\) at level \(1- (1 - \alpha_1)(1 - \alpha_2)\) if \(\mathcal{I}_1(X_1) - \mathcal{I}_2(X_2)\) does not contain 0.
As you can see, this method involves estimating \(p_1\) and \(p_2\) separately, even though we only care about \(p_1-p_2\). We can do a bit better than this.
Unconditional tests¶
We can also base tests on the full joint distribution of \(X_1\) and \(X_2\). [TBD]
The Santner-Snell Test Santner and Snell (1980) give a procedure for finding an exact confidence interval for \(p_c-p_t\) and \(p_c/p_t\).
Alternative approach using potential outcomes: We have \(\{U_j\}_{j=1}^N\) iid \(U[0,1]\), independent of the \(\{T_j\}_{j=1}^N\). Let \(z_{cj} = 1_{U_j \le p_c}\) and \(z_{tj} = 1_{U_j \le p_t}\). Condition on \(\{U_j\}\).
[MORE TO COME]
Individual probabilities¶
A more reasonable model is that individual \(j\) has a “personal” probability \(p_{tj}\) of becoming infected if vaccinated and \(p_{tc}\) if unvaccinated.
[MORE TO COME]
Confidence bound for the mean of a binary population from a stratified sample from the population.¶
This problem occurs in opinion surveys, financial auditing, and election auditing. It will let us see some of the issues that arise when there are nuisance parameters.
We will use four powerful techniques:
the duality between hypothesis tests and confidence sets
maximizing \(P\)-values over nuisance parameters
union-intersection tests
combining independent \(P\)-values
It is also an example of a combinatorial optimization problem that can be solved quickly using a greedy algorithm–which is rare.
The problem¶
A population of \(N\) items of which \(G\) are labeled “1” and \(N-G\) are labeled “0” is partitioned into \(S\) strata. Stratum \(s\) contains \(N_s\) items, of which \(G_s\) are labeled “1.” Thus \(N = \sum_{s=1}^S N_s\) and \(G \equiv \sum_{s=1}^S G_s\). We draw a simple random sample of size \(n_s\) from stratum \(s\), \(s = 1, \ldots, S\), independently across strata. (I.e., from stratum \(s\) we draw a sample of size \(n_s\) in such a way that every subset of \(n_s\) distinct items of the \(N_s\) items is equally likely; and the \(S\) samples are drawn independently.)
Let \(x_{sj}\) denote the value if the \(j\)th item in stratum \(s\), \(s=1, \ldots, S\), \(j=1, \ldots, N_s\). Then \(G_s = \sum_{j=1}^{N_s} x_{sj}\) and
Let \(Y_s\) denote the sum of the values in the sample from stratum \(s\). The variables \(\{Y_s \}_{s=1}^S\) are independent. The observed value of \(Y_s\) is \(y_s\).
We seek hypothesis tests and confidence bounds for \(G\). We first consider one-sided tests of the hypothesis \(G = g\) against the alternative \(G > g\), and corresponding lower confidence bounds for \(G\). Reversing the roles of “0” and “1” gives upper confidence bounds, mutatis mutandis.
The general strategy for testing the hypothesis \(G=g\) is to find the largest \(P\)-value among all ways of allocating \(g\) items labeled “1” among the \(S\) strata (honoring the stratum sizes \(\{N_s\}\)). That is a \(P\)-value for the composite hypothesis \(G=g\). The maximum can be found by examining all such allocations and calculating the \(P\)-value for each.
Maximizing the \(P\)-value over all allocations of \(G\) ones across \(S\) strata is combinatorially complex: Feller’s “bars and stars” argument shows that there are \(\binom{G+S-1}{S-1}\) ways to allocate \(G\) objects among \(S\) strata. (Some of those can be ruled out, for instance if \(G\) exceeds the size of any stratum.) For \(S=10\) strata of size \(N_s = 400\) and \(G = 300\), there are roughly 6.3e+16 allocations: impractical by any standard.
Aside: Feller’s Bars and Stars Argument¶
How many distinguishable ways are there to divide \(G\) indistinguishable objects into \(S\) groups? Feller (1950) argues as follows.
Consider \(G\) “stars” lined up in a row:
To divide them into \(S\) groups, we place \(S-1\) “bars” somewhere among them. For instance,
corresponds to having 2 items in the first group, 4 in the second, …, and 3 in the \(S\)th group. Similarly,
corresponds to having all the items in the \(S\)th group. The number of ways of partitioning the \(G\) items into \(S\) groups is thus the number of ways of choosing \(S-1\) places to put bars among a total of \(G+S-1\) places (the bars and stars pooled together), i.e., \(\binom{G+S-1}{S-1}\).
We now examine some approaches to making confidence intervals for the population total \(G\) (equivalently, for the population mean \(\mu = G/N\)) from a stratified sample.
Wright’s Method: Sum of Šidák Intervals¶
One easy way to get a lower confidence bound for the sum is to take the sum of simultaneous lower confidence bounds for each stratum. Because the samples from different strata are independent, Šidák’s adjustment works. Wright (1991) suggests this approach.
A confidence bound for \(G_s\) can be constructed from \(Y\) by inverting hypergeometric tests.
To have joint confidence level \(1-\alpha\), make each confidence interval at \((1-\alpha)^{1/S}\).
The chance that all of the \(S\) confidence bounds cover their corresponding parameters is
because the intervals all based on independent data. This approach to making simultaneous confidence bounds for a set of parameters using independent data for each parameter is called Šidák’s method.
Wright’s method is an example of a much more general approach: make a joint \(1-\alpha\) confidence set for all the parameters \(\{G_j\}_{j=1}^S\), then find a lower bound on a functional of interest (here, their sum) over the joint set. Whenever the joint confidence set covers the parameter, the lower bound does not exceed the true value of the functional of the parameter. But because we do not care about the individual stratum sums, only the overall population sum, this approach is unnecessarily conservative: we are “wasting” effort constraining them separately.
The Wendell-Schmee Test¶
Wendell and Schmee (1996, https://www.tandfonline.com/doi/abs/10.1080/01621459.1996.10476950) proposed making inferences about the population total by maximizing a \(P\)-value over a set of nuisance parameters—the individual stratum totals. They find the \(P\)-value by ordering possible outcomes based on the estimated population proportion: the test statistic is the “tail probability” of \(\hat{p}\), the unbiased estimator of the population percentage from the stratified sample. They construct confidence bounds by inverting hypothesis tests.
The test statistic for the Wendell-Schmee test is the unbiased estimate of the population proportion, \(G/N\):
The \(P\)-value of the hypothesis \(G_s=g_s\), \(s=1, \ldots, S\), is the “lower tail probability” of \(\hat{p}\).
Wendell and Schmee consider maximizing this lower tail probability over all allocations of \(g\) ones across strata, either by exhaustive search, or by numerical optimization using a descent method from some number of random starting points. (They show graphically that the tail probability is not convex in the original parametrization.)
In practice, their method is computationally intractable when there are more than \(S=3\) strata.
Constructive maximization¶
For some test statistics, there is a much more efficient approach.
Define $\( p_s(g_s) \equiv \Pr \{ Y_s \ge y_s || G_s = g_s \} = \sum_{y = y_s}^{g_s} \frac{\binom{g_s}{y} \binom{N_s-g_s}{n_s - y}}{\binom{N_s}{n_s}}, \)\( where \)\binom{a}{b} \equiv 0\( if \)a \le 0\( or \)b > a\(. (The double vertical bars denote "computed on the assumption that.") This is the \)P\(-value of the hypothesis \)G_s = g_s\( tested against the alternative \)G_s > g_s$.
A test of the conjunction hypothesis \(G_s = g_s\), \(s=1, \ldots, S\) can be constructed using Fisher’s combining function: if all \(S\) hypotheses are true, the distribution of $\( X^2(\vec{g}) \equiv -2 \sum_{s=1}^S \log p_s(g_s) \)\( is _dominated_ by the chi-square distribution with \)2S\( degrees of freedom. (That is, the chance \)X^2 \ge t\( is less then or equal to the chance that a random variable with a chi-square distribution is \)\ge t\(, for all \)t$.)
Let \(\chi_d(z)\) denote the survival function for the chi-sqare distribution with \(d\) degrees of freedom, i.e., the chance that a random variable with the chi-square distribution with \(d\) degrees of freedom is greater than or equal to \(z\). Then a conservative \(P\)-value for the allocation \(\vec{g}\) $\( P(\vec{g}) = \chi_{2S}(X^2(\vec{g})). \)\( The allocation \)\vec{g}\( of \)g\( ones across strata that maximizes the \)P\(-value is the allocation that minimizes \)X^2(\vec{g})\( and satisfies \)\sum_s g_s = g\(. Equivalently, it is the allocation that maximizes \)\sum_{s=1}^S \log p_s(g_s)$.
Let $\( a_s(j) \equiv \left \{ \begin{array}{ll} \log p_s(y_s), & j = y_s \\ \log \left (p_s(j)/p_s(j-1) \right ), & j = y_s+1, \ldots N_s-(n_s-y_s). \end{array} \right . \)$
Then \(\log p_s(g_s) = \sum_{j=y_s}^{g_s} a_s(j)\) if \(y_s \le g_s \le N-(n_s-y_s)\), and \(\log p_s(g_s) = -\infty\) otherwise. Moreover, $\( X^2(\vec{g}) = -2\sum_{s=1}^S a_s(y_s) -2\sum_{s=1}^S \sum_{j=y_s+1}^{g_s} a_s(j) \)\( provided \)y_s \le g_s \le N-(n_s-y_s)\(, \)s=1, \ldots, S$; otherwise, it is infinite.
An allocation of \(g\) ones across strata is inconsistent with the data unless \(g_s \ge y_s\), \(s=1, \ldots, S\). Thus, in considering how to allocate \(g\) ones to maximize the \(P\)-value, the first sum above, accounting for \(\sum_s y_s\) ones, is “mandatory,” or the \(P\)-value will be zero. The question is how to allocate the remaining \(g - \sum_s y_s\) ones to maximize the \(P\)-value (equivalently, to minimize \(X^2(\vec{g})\)).
Let \(b_k\) denote the \(k\)th largest element of the set
with ties broken arbitrarily. Define \(\tilde{g}_y \equiv g - \sum_{s=1}^S y_s\).
Proposition. For every \(\vec{g}\) with \(\sum_s g_s = g\), $\( X^2(\vec{g}) \ge X_*^2(g) \equiv \left \{ \begin{array}{ll} -2 \left ( \sum_{s=1}^S a_s(y_s) + \sum_{k=1}^{\tilde{g}_y} b_k \right ), & \sum_s y_s \le g \le N - \sum_s (n_s-y_s) \\ \infty, & \mbox{ otherwise } \end{array} \right . . \)$
Proof. Any \(\vec{g}\) for which \(X^2(\vec{g})\) is finite includes the first sum and a sum of \(\tilde{g}_y\) elements of \(\{b_k\}\); the latter is at most the sum of the \(\tilde{g}_y\) largest elements of \(\{b_k\}\). \(\Box\)
Moreover, if the \(\tilde{g}_y\) largest elements of \(\{b_k \}\) correspond to an allocation of \(\tilde{g}_y\) ones across the strata, the bound is sharp. That turns out to be true (the proof involves log convexity of the distributions). The second sum thus corresponds to a particular allocation \(\vec{g}\) of \(g\) ones across the \(S\) strata, with \(y_s \le g_s \le N_s-(n_s-y_s)\). Among all allocations of \(g\) items labeled “1,” this one minimizes has the smallest tail probability, because it is the exponentiation of the smallest sum of logs (the largest negative sum of logs). \(\Box\)
Proposition: For \(j \in y_s+1, \ldots, N_s-(n_s-y_s)\), \(a_s(j)\) is monotone decreasing in \(j\).
Let \(P(g) \equiv \max_{\vec{g}: \sum_s g_s = g} P(\vec{g})\).
Theorem: If \(\sum_s y_s \le g \le N - \sum_s (n_s-y_s)\),
Proof: Immediate from the definitions.
The theorem shows that a “greedy” approach finds a conservative \(P\)-value:
Construct the values \(a_s(j)\) and the set \(\{b_k\}\).
Add the \(S\) values \(\{a_s(x_k)\) to the \(g-g_y\) largest elements of \(\{b_k\}\) and multiply the sum by \(-2\).
The upper tail probability of the chi-square distribution with \(2S\) degrees of freedom is a conservative \(P\)-value for the hypothesis \(G=g\).
A conservative upper \(1-\alpha\) confidence bound for \(G\) can be found by inverting the test: it is the largest \(g\) for which \(P(g) \ge \alpha\).
Changing the direction of the test¶
The test of the hypothesis \(G=g\) given above is a one-sided test against the alternative \(G > g\): it rejects if the chance of observing “so few” good objects is small.
To test against the alternative \(G < g\) (i.e., to reject if the chance of observing “so many” good objects is small), exchange the role of “good” and “bad.” The hypothesis \(G < g\) is equivalent to the hypothesis \((N-G) > (N-g)\).
The resulting null hypothesis is \(G = N-g\), and the data are \(n_s - Y_s\).
We now code the combinatorial optimization and the greedy optimization.
# Install permute and cryptorandom in the current kernel, if not already installed
import sys
!{sys.executable} -m pip install permute --user
!{sys.executable} -m pip install cryptorandom --user
Requirement already satisfied: permute in /opt/anaconda3/lib/python3.7/site-packages (0.1a5)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.7/site-packages (from permute) (1.19.2)
Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.7/site-packages (from permute) (1.5.2)
Requirement already satisfied: cryptorandom in /opt/anaconda3/lib/python3.7/site-packages (0.2)
Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.7/site-packages (from cryptorandom) (1.5.2)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.7/site-packages (from cryptorandom) (1.19.2)
from scipy.stats import binom, hypergeom, chi2
import itertools
import warnings
from permute.utils import binom_conf_interval, hypergeom_conf_interval
@lru_cache(maxsize=None) # decorate the function to cache the results
# of calls to the function
def strat_test_brute(strata, sams, found, good, alternative='upper'):
"""
Find p-value of the hypothesis that the number G of "good" objects in a
stratified population is less than or equal to good, using a stratified
random sample.
Assumes that a simple random sample of size sams[s] was drawn from stratum s,
which contains strata[s] objects in all.
The P-value is the maximum Fisher combined P-value across strata
over all allocations of good "good" objects among the strata. The allocations are
enumerated using Feller's "bars and stars" construction, constrained to honor the
stratum sizes (each stratum can contain no more "good" items than it has items in all,
nor fewer "good" items than the sample contains).
The number of allocations grows combinatorially: there can be as many as
[(#strata + #good items) choose (#strata-1)] allocations, making the brute-force
approach computationally infeasible when the number of strata and/or the number of
good items is large.
The test is a union-intersection test: the null hypothesis is the union over allocations
of the intersection across strata of the hypotheses that the number of good items
in the stratum is less than or equal to a constant.
Parameters:
-----------
strata : list of ints
sizes of the strata. One int per stratum.
sams : list of ints
the sample sizes from each stratum
found : list of ints
the numbers of "good" items found in the samples from the strata
good : int
the hypothesized total number of "good" objects in the population
alternative : string {'lower', 'upper'}
test against the alternative that the true value is less than good (lower)
or greater than good (upper)
Returns:
--------
p : float
maximum combined p-value over all ways of allocating good "good" objects
among the strata, honoring the stratum sizes.
best_part : list
the partition that attained the maximum p-value
"""
if alternative == 'upper': # exchange roles of "good" and "bad"
p_func = lambda f, s, p, x: sp.stats.hypergeom.logcdf(f, s, p, x)
elif alternative == 'lower':
p_func = lambda f, s, p, x: sp.stats.hypergeom.logsf(f-1, s, p, x)
else:
raise NotImplementedError("alternative {} not implemented".format(alternative))
best_part = found # start with what you see
strata = np.array(strata, dtype=int)
sams = np.array(sams, dtype=int)
found = np.array(found, dtype=int)
good = int(good)
if good < np.sum(found):
p = 0 if alternative == 'lower' else 1
elif good > np.sum(strata) - np.sum(sams) + np.sum(found):
p = 1 if alternative == 'lower' else 0
else: # use Feller's "bars and stars" enumeration of combinations, constrained
log_p = np.NINF # initial value for the max
n_strata = len(strata)
parts = sp.special.binom(good+n_strata-1, n_strata-1)
if parts >= 10**7:
print("warning--large number of partitions: {}".format(parts))
barsNstars = good + n_strata
bars = [0]*n_strata + [barsNstars]
partition = ([bars[j+1] - bars[j] - 1 for j in range(n_strata)] \
for bars[1:-1] in itertools.combinations(range(1, barsNstars), n_strata-1) \
if all(((bars[j+1] - bars[j] - 1 <= strata[j]) and \
(bars[j+1] - bars[j] >= found[j])) for j in range(n_strata)))
for part in partition:
log_p_new = 0
for s in range(n_strata): # should be possible to vectorize this
log_p_new += p_func(found[s], strata[s], part[s], sams[s])
if log_p_new > log_p:
best_part = part
log_p = log_p_new
p = sp.stats.chi2.sf(-2*log_p, df = 2*len(strata))
return p, best_part
@lru_cache(maxsize=None) # decorate the function to cache the results
# of calls to the function
def strat_ci_bisect(strata, sams, found, alternative='lower', cl=0.95,
p_value=strat_test_brute):
"""
Confidence bound on the number of ones in a stratified population,
based on a stratified random sample (without replacement) from
the population.
If alternative=='lower', finds an upper confidence bound.
If alternative=='upper', finds a lower confidence bound.
Uses an integer bisection search to find an exact confidence bound.
The starting upper endpoint for the search is the unbiased estimate
of the number of ones in the population. That could be refined in various
ways to improve efficiency.
The lower endpoint for the search is the Šidák joint lower confidence bounds,
which should be more conservative than the exact bound.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes in the strata
found : list of ints
number of ones found in each stratum in each sample
alternative : string {'lower', 'upper'}
if alternative=='lower', finds an upper confidence bound.
if alternative=='upper', finds a lower confidence bound.
While this is not mnemonic, it corresponds to the sidedness of the tests
that are inverted to get the confidence bound.
cl : float
confidence level. Assumed to be at least 50%.
p_value : callable
method for computing the p-value
Returns:
--------
b : int
confidence bound
best_part : list of ints
partition that attains the confidence bound
"""
assert alternative in ['lower', 'upper']
if alternative == 'upper': # interchange good and bad
compl = np.array(sams)-np.array(found) # bad items found
cb, best_part = strat_ci_bisect(strata, sams, compl, alternative='lower',
cl=cl, p_value=p_value)
b = np.sum(strata) - cb # good from bad
best_part_b = np.array(strata, dtype=int)-np.array(best_part, dtype=int)
else:
cl_sidak = math.pow(cl, 1/len(strata)) # Šidák adjustment
tail = 1-cl
a = sum((hypergeom_conf_interval( \
sams[s], found[s], strata[s], cl=cl_sidak, alternative="lower")[0] \
for s in range(len(strata)))) # Šidák should give a lower bound
b = int(np.sum(np.array(strata)*np.array(found)/np.array(sams)))-1 # expected good
p_a, best_part_a = p_value(strata, sams, found, a, alternative=alternative)
p_b, best_part_b = p_value(strata, sams, found, b, alternative=alternative)
tot_found = np.sum(found)
while p_a > tail and a > tot_found:
a = math.floor(a/2)
p_a, best_part_a = p_value(strata, sams, found, a, alternative=alternative)
if p_a > tail:
b = a
best_part_b = best_part_a
else:
while b-a > 1:
c = int((a+b)/2)
p_c, best_part_c = p_value(strata, sams, found,
c, alternative=alternative)
if p_c > tail:
b, p_b, best_part_b = c, p_c, best_part_c
elif p_c < tail:
a, p_a, best_part_a = c, p_c, best_part_c
elif p_c == tail:
b, p_b, best_part_b = c, p_c, best_part_c
break
return b, list(best_part_b)
@lru_cache(maxsize=None) # decorate the function to cache the results
# of calls to the function
def strat_test(strata, sams, found, good, alternative='lower'):
"""
P-value for the hypothesis the number of ones in a stratified population is not
greater than (or not less than) a hypothesized value, based on a stratified
random sample without replacement from the population.
Uses a fast algorithm to find (an upper bound on) the P-value constructively.
Uses Fisher's combining function to combine stratum-level P-values.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes in the strata
found : list of ints
number of ones found in each stratum in each sample
good : int
hypothesized number of ones in the population
alternative : string {'lower', 'upper'}
test against the alternative that the true number of "good" items
is less than (lower) or greater than (upper) the hypothesized number, good
Returns:
--------
p : float
P-value
best_part : list
the partition that attained the maximum p-value
"""
assert alternative in ['lower', 'upper']
if alternative == 'upper': # exchange roles of "good" and "bad"
compl = np.array(sams) - np.array(found) # bad items found
bad = np.sum(strata) - good # total bad items hypothesized
res = strat_test(strata, sams, compl, bad, alternative='lower')
return res[0], list(np.array(strata, dtype=int)-np.array(res[1], dtype=int))
else:
good = int(good)
best_part = [] # best partition
if good < np.sum(found): # impossible
p = 0
best_part = found
elif good >= np.sum(strata) - np.sum(sams) + np.sum(found): # guaranteed
p = 1
best_part = list(np.array(strata, dtype=int) - \
np.array(sams, dtype=int) + \
np.array(found, dtype=int))
elif good >= np.sum(found): # outcome is possible under the null
log_p = 0 # log of joint probability
contrib = [] # contributions to the log joint probability
base = np.sum(found) # must have at least this many good items
for s in range(len(strata)):
log_p_j = sp.stats.hypergeom.logsf(found[s]-1, strata[s], found[s], sams[s])
# baseline p for minimum number of good items in stratum
log_p += log_p_j # log of the product of stratum-wise P-values
for j in range(found[s]+1, strata[s]-(sams[s]-found[s])+1):
log_p_j1 = sp.stats.hypergeom.logsf(found[s]-1, strata[s], j, sams[s])
# tail probability for j good in stratum
contrib.append([log_p_j1 - log_p_j, s])
# relative increase in P from new item
log_p_j = log_p_j1
sorted_contrib = sorted(contrib, key = lambda x: x[0], reverse=True)
best_part = np.array(found)
for i in range(good-base):
log_p += sorted_contrib[i][0]
best_part[int(sorted_contrib[i][1])] += 1
p = sp.stats.chi2.sf(-2*log_p, df = 2*len(strata))
return p, list(best_part)
def strat_ci_search(strata, sams, found, alternative='lower', cl=0.95):
"""
Confidence bound on the number of ones in a stratified population,
based on a stratified random sample (without replacement) from
the population.
If alternative=='lower', finds an upper confidence bound.
If alternative=='upper', finds a lower confidence bound.
Searches for the allocation of items that attains the confidence bound
by increasing the number of ones from the minimum consistent
with the data (total found in the sample) until the P-value is greater
than 1-cl.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes in the strata
found : list of ints
number of ones found in each stratum in each sample
alternative : string {'lower', 'upper'}
if alternative=='lower', finds an upper confidence bound.
if alternative=='upper', finds a lower confidence bound.
While this is not mnemonic, it corresponds to the sidedness of the tests
that are inverted to get the confidence bound.
cl : float
confidence level. Assumed to be at least 50%.
Returns:
--------
cb : int
confidence bound
best_part : list of ints
partition that attains the confidence bound (give or take one item)
"""
assert alternative in ['lower', 'upper']
if alternative == 'upper': # interchange good and bad
compl = np.array(sams)-np.array(found) # bad items found
cb, best_part = strat_ci(strata, sams, compl, alternative='lower', cl=cl)
cb = np.sum(strata) - cb # good from bad
best_part = np.array(strata, dtype=int)-np.array(best_part, dtype=int)
else:
cb = int(np.sum(np.array(strata)*np.array(found)/np.array(sams)))-1 # expected good
p_attained, best_part = strat_test(strata, sams, found, cb,
alternative=alternative)
while p_attained >= 1-cl:
cb -= 1
p_attained, best_part = strat_test(strata, sams, found, cb,
alternative=alternative)
cb += 1
p_attained, best_part = strat_test(strata, sams, found, cb,
alternative=alternative)
return cb, list(best_part)
def strat_ci(strata, sams, found, alternative='lower', cl=0.95):
"""
Conservative confidence bound on the number of ones in a population,
based on a stratified random sample (without replacement) from
the population.
If alternative=='lower', finds an upper confidence bound.
If alternative=='upper', finds a lower confidence bound.
Constructs the confidence bound directly by constructing the
allocation of the maximum number of ones that would not be
rejected at (conservative) level 1-cl.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes in the strata
found : list of ints
number of ones found in each stratum in each sample
alternative : string {'lower', 'upper'}
if alternative=='lower', finds an upper confidence bound.
if alternative=='upper', finds a lower confidence bound.
While this is not mnemonic, it corresponds to the sidedness of the tests
that are inverted to get the confidence bound.
cl : float
confidence level
Returns:
--------
cb : int
confidence bound
best_part : list of ints
partition that attains the confidence bound (give or take one item)
"""
assert alternative in ['lower', 'upper']
if alternative == 'upper': # interchange role of good and bad
compl = np.array(sams)-np.array(found) # bad found
cb, best_part = strat_ci(strata, sams, compl, alternative='lower', cl=cl)
best_part = np.array(strata, dtype=int)-np.array(best_part, dtype=int)
cb = np.sum(strata) - cb # good to bad
else:
threshold = -sp.stats.chi2.ppf(cl, df=2*len(strata))/2
# g is in the set if
# chi2.sf(-2*log(p), df=2*len(strata)) >= 1-cl
# i.e., -2*log(p) <= chi2.ppf(cl, df)
# i.e., log(p) >= -chi2.ppf(cl, df)/2
log_p = 0 # log of joint probability
contrib = [] # contributions to the log joint probability
base = np.sum(found) # must have at least this many good items
for s in range(len(strata)):
log_p_j = sp.stats.hypergeom.logsf(found[s]-1, strata[s], found[s], sams[s])
# baseline p for minimum number of good items in stratum
log_p += log_p_j # log of the product of stratum-wise P-values
small = np.PINF # for monotonicity check
for j in range(found[s]+1, strata[s]-(sams[s]-found[s])+1):
log_p_j1 = sp.stats.hypergeom.logsf(found[s]-1, strata[s], j, sams[s])
log_p_j1 = log_p_j if log_p_j1 < log_p_j else log_p_j1 # true difference is nonnegative
contrib.append([log_p_j1 - log_p_j, s])
log_p_j = log_p_j1
if contrib[-1][0] > small:
print("reversal in stratum {} for {} good; old: {} new:{}".format(s,
j, small, contrib[-1][0]))
small = contrib[-1][0]
sorted_contrib = sorted(contrib, key = lambda x: x[0], reverse=True)
best_part = np.array(found)
added = 0
while log_p < threshold:
log_p += sorted_contrib[added][0]
best_part[int(sorted_contrib[added][1])] += 1
added += 1
cb = base + added
return cb, list(best_part)
def strat_p(strata, sams, found, hypo, alternative='lower'):
"""
Finds tail probability for the hypothesized population counts <hypo> for
simple random samples of sizes <sams> from strata of sizes <strata> if
<found> ones are found in the strata.
Uses Fisher's combining function across strata.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes from the strata
found : list of ints
number of ones found in each stratum in each sample
hypo : list of ints
hypothesized number of ones in the strata
Returns:
--------
p : float
appropriate tail probability
"""
assert alternative in ['lower', 'upper']
if alternative == 'lower':
p_func = lambda x, N, G, n: sp.stats.hypergeom.sf(x-1, N, G, n)
else:
p_func = lambda x, N, G, n: sp.stats.hypergeom.cdf(x, N, G, n)
p = 1
for s in range(len(strata)):
p *= p_func(found[s], strata[s], hypo[s], sams[s])
return sp.stats.chi2.sf(-2*math.log(p), df=2*len(strata))
def strat_p_ws(strata, sams, found, hypo, alternative='upper'):
"""
Finds Wendell-Schmee P-value for the hypothesized population counts <hypo> for
simple random samples of sizes <sams> from strata of sizes <strata> if
<found> ones are found in the strata.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes from the strata
found : list of ints
number of ones found in each stratum in each sample
hypo : list of ints
hypothesized number of ones in the strata
Returns:
--------
p : float
tail probability
"""
assert alternative in ['lower', 'upper']
if alternative == 'lower': # exchange roles of "good" and "bad"
compl = np.array(sams) - np.array(found) # bad items found
hypo_c = np.array(strata) - np.array(hypo) # total bad items hypothesized
p = strat_p_ws(strata, sams, compl, hypo_c, alternative='upper')
else:
p_hat = lambda f, st=strata, sa=sams: np.sum(np.array(st)*np.array(f)/np.array(sa))/np.sum(st) # pooled estimate
p_hat_0 = p_hat(found)
per_strat = np.array(strata)/np.array(sams)/np.sum(strata)
strat_max = np.floor(p_hat_0/per_strat)
lo_t = (t for t in itertools.product(*[range(int(s+1)) for s in strat_max]) \
if p_hat(t) <= p_hat_0)
p = sum(np.prod(sp.stats.hypergeom.pmf(t, strata, hypo, sams)) \
for t in lo_t)
return p
@lru_cache(maxsize=None) # decorate the function to cache the results
# of calls to the function
def strat_test_ws(strata, sams, found, good, alternative='lower'):
"""
Find p-value of the hypothesis that the number G of "good" objects in a
stratified population is less than or equal to good, using a stratified
random sample.
Assumes that a simple random sample of size sams[s] was drawn from stratum s,
which contains strata[s] objects in all.
The P-value is the maximum Windell-Schmee P-value over all allocations of
good objects among the strata. The allocations are enumerated using Feller's
"bars and stars" construction, constrained to honor the stratum sizes (each
stratum can contain no more "good" items than it has items in all, nor fewer
"good" items than the sample contains).
The number of allocations grows combinatorially: there can be as many as
[(#strata + #good items) choose (#strata-1)] allocations, making the brute-force
approach computationally infeasible when the number of strata and/or the number of
good items is large.
Parameters:
-----------
strata : list of ints
sizes of the strata. One int per stratum.
sams : list of ints
the sample sizes from each stratum
found : list of ints
the numbers of "good" items found in the samples from the strata
good : int
the hypothesized total number of "good" objects in the population
alternative : string {'lower', 'upper'}
test against the alternative that the true value is less than good (lower)
or greater than good (upper)
Returns:
--------
p : float
maximum combined p-value over all ways of allocating good "good" objects
among the strata, honoring the stratum sizes.
best_part : list
the partition that attained the maximum p-value
"""
assert alternative in ['lower', 'upper']
if alternative == 'lower': # exchange roles of "good" and "bad"
compl = np.array(sams) - np.array(found) # bad items found
bad = np.sum(strata) - good # total bad items hypothesized
res = strat_test_ws(strata, sams, compl, bad, alternative='upper')
return res[0], list(np.array(strata, dtype=int)-np.array(res[1], dtype=int))
best_part = found # start with what you see
good = int(good)
if good < np.sum(found):
p = 0 if alternative == 'lower' else 1
elif good > np.sum(strata) - np.sum(sams) + np.sum(found):
p = 1 if alternative == 'lower' else 0
else: # use Feller's "bars and stars" enumeration of combinations, constrained
p_hat = lambda f, st=strata, sa=sams: np.sum(np.array(st)*np.array(f)/np.array(sa))/np.sum(st) # pooled estimate
p_hat_0 = p_hat(found)
per_strat = np.array(strata)/np.array(sams)/np.sum(strata)
strat_max = np.floor(p_hat_0/per_strat)
p = 0 # initial value for the max
n_strata = len(strata)
parts = sp.special.binom(good+n_strata-1, n_strata-1)
if parts >= 10**7:
print("warning--large number of partitions: {}".format(parts))
barsNstars = good + n_strata
bars = [0]*n_strata + [barsNstars]
partition = ([bars[j+1] - bars[j] - 1 for j in range(n_strata)] \
for bars[1:-1] in itertools.combinations(range(1, barsNstars), n_strata-1) \
if all(((bars[j+1] - bars[j] - 1 <= strata[j]) and \
(bars[j+1] - bars[j] >= found[j])) for j in range(n_strata)))
for part in partition:
lo_t = (t for t in itertools.product(*[range(int(s+1)) for s in strat_max]) \
if p_hat(t) <= p_hat_0)
p_new = 0
for t in lo_t:
p_temp = 1
for s in range(len(strata)):
p_temp *= sp.stats.hypergeom.pmf(t[s], strata[s], part[s], sams[s])
p_new += p_temp
if p_new > p:
best_part = part
p = p_new
return p, list(best_part)
def strat_ci_wright(strata, sams, found, alternative='lower', cl=0.95):
"""
Confidence bound on the number of ones in a stratified population,
based on a stratified random sample (without replacement) from
the population.
If alternative=='lower', finds an upper confidence bound.
If alternative=='upper', finds a lower confidence bound.
Constructs the confidence bound by finding Šidák multiplicity-adjusted
joint lower confidence bounds for the number of ones in each stratum.
This approach is mentioned in Wright, 1991.
Parameters:
-----------
strata : list of ints
stratum sizes
sams : list of ints
sample sizes in the strata
found : list of ints
number of ones found in each stratum in each sample
alternative : string {'lower', 'upper'}
if alternative=='lower', finds an upper confidence bound.
if alternative=='upper', finds a lower confidence bound.
While this is not mnemonic, it corresponds to the sidedness of the tests
that are inverted to get the confidence bound.
cl : float
confidence level
Returns:
--------
cb : int
confidence bound
"""
assert alternative in ['lower', 'upper']
inx = 0 if alternative == 'lower' else 1
cl_sidak = math.pow(cl, 1/len(strata)) # Šidák-adjusted confidence level per stratum
cb = sum((hypergeom_conf_interval(
sams[s], found[s], strata[s], cl=cl_sidak, alternative=alternative)[inx]
for s in range(len(strata))))
return cb
Let’s check the empirical coverage of the greedy method.
# test empirical coverage
reps = int(10**3)
cl = 0.95
alternative = 'upper'
strata = [5000, 3000, 2000]
sams = [75,50,25]
good = [100, 100, 500]
g = np.sum(good)
cover = 0
verb = False
for i in range(reps):
found = sp.stats.hypergeom.rvs(strata, good, sams)
ub = strat_ci(strata, sams, found, alternative=alternative, cl=cl)
if verb:
print("f: {} ub: {} best: {}".format(found, ub[0], ub[1]))
cover = cover+1 if ub[0] >= g else cover
if i % 100 == 0:
print(i+1, cover, cover/(i+1))
cover/reps
1 1 1.0
101 101 1.0
201 200 0.9950248756218906
301 300 0.9966777408637874
401 400 0.9975062344139651
501 500 0.998003992015968
601 600 0.9983361064891847
701 700 0.9985734664764622
801 800 0.9987515605493134
901 899 0.9977802441731409
0.998
Let’s write some numerical tests to confirm that the three ways of computing bounds agree.
def test_strat_test(verbose = False):
strata = [[10, 20, 30, 40], [20, 20, 20]]
sams = [[2, 3, 4, 5], [5, 5, 10]]
found = [[1, 2, 3, 4], [0, 3, 2]]
for i in range(len(strata)):
compl = np.array(sams[i]) - np.array(found[i])
for j in list(range(4, 20)) + [60]:
for alternative in ['lower', 'upper']:
if verbose:
print("i: {} j: {} alternative: {}".format(i, j, alternative))
p_exact = strat_test_brute(strata[i], sams[i], found[i], j, alternative=alternative)
p_exact_c = strat_test_brute(strata[i], sams[i], compl, j, alternative=alternative)
p_lkp = strat_test(strata[i], sams[i], found[i], j, alternative=alternative)
p_lkp_c = strat_test(strata[i], sams[i], compl, j, alternative=alternative)
np.testing.assert_allclose(p_exact[0], p_lkp[0])
np.testing.assert_allclose(p_exact_c[0], p_lkp_c[0])
def test_strat_ci(verbose = False):
strata = [[10, 20], [10, 20, 30, 40], [10, 20, 20, 30]]
sams = [[5, 5], [2, 3, 4, 5], [2, 4, 5, 6]]
found = [[2, 2], [1, 2, 3, 4], [0, 1, 2, 3]]
for i in range(len(strata)):
for alternative in ['lower','upper']:
brute = strat_ci_bisect(strata[i], sams[i], found[i], alternative=alternative)
lkp = strat_ci(strata[i], sams[i], found[i], alternative=alternative)
lkp_s = strat_ci_search(strata[i], sams[i], found[i], alternative=alternative)
if verbose:
print("{}-{}".format(i, alternative))
print("i: {} brute: {} lkp: {} lkp_s: {} \nbest_brute: {} best_lkp: {} best_lkp_s: {}".format(
i, brute[0], lkp[0], lkp_s[0], brute[1], lkp[1], lkp_s[1]))
np.testing.assert_allclose(brute[0], lkp[0])
np.testing.assert_allclose(brute[0], lkp_s[0])
np.testing.assert_allclose(brute[1], lkp[1])
np.testing.assert_allclose(brute[1], lkp_s[1])
test_strat_test()
test_strat_ci()