# Sets, Combinatorics, & Probability

## Probability versus Statistics

In probability, we assume we know what the world is like, and predict what we would then see.

In statistics, we make observations and use them to learn something about the world.
That generally requires a mathematical model for how the obervations were generated, to relate the observations to properties of the world.
Sometimes, that model arises naturally from knowledge of the process that generated the data,
for instance, a randomized experiment or a physical measurement where the
underlying physics is understood.
In other cases, the model is basically an assumption.
When the model does not have a firm basis in actual knowledge, inferences are suspect.
As Robert L. Parker says, "the more you assume, the less you know."

Probability is like a _forward problem_ in science or engineering: the the physics, including any parameters,
is known. The issue is to predict what will be observed, up to instrumental error and other noise.

Statistics, at least inferential statistics, is like an _inverse problem_. The physics is known
but there are unknown parameters. The issue is to use the (noisy) data to learn about parameters
of the system.

Probability is a combination of mathematics and philosophy.
The philosophy connects the mathematics to the world.

We will start with the mathematics.

## Sets

The mathematics of probability is expressed most naturally using set theory.
We will review the basic terminology and reviews naive set theory: how to define and
manipulate sets, operations on sets that yield other sets, special
relationships among sets, and so on.

A _set_ is a collection of objects, called _members_ or _elements_ of the set, without
regard for their order.
$a \in A$, pronounced "$a$ is an element of $A$,"  "$a$ is in $A$," or "$a$ is a member of
$A$" means that $a$ is an element of the set $A$.
This is the same as writing $A \ni a$, which is pronounced "$A$ contains $a$."
If $a$ is not an element of $A$, we write $a \not \in A$.
Sets may be described explicitly by listing their contents, or implicitly
by specifying a property that all elements of the set share, or a condition
that they satisfy.
The contents of sets are enclosed in curly braces: $\{ \}$.

Here are some examples:
+ $A = \{a, b, c, d \}$: the set containing the four elements $a$, $b$, $c$, and $d$.
+ $\emptyset = \{ \}$: the empty set, the set that contains no elements.
+ ${\mathbf Z}  \equiv \{ \ldots, -2, -1, 0, 1, 2, \ldots \}$: the integers.
+ ${\mathbf N}  \equiv \{1, 2, 3, \ldots \}$: the natural (counting) numbers.
+ $\Re \equiv (-\infty, \infty)$: the real numbers.
+ $\Re^+   \equiv [-\infty, \infty]$: the extended real numbers.
+ ${\mathbf C}  \equiv \{ a + bi: a, b \in \Re \}$, where $i = \sqrt{-1}$: the complex numbers.
+ ${\mathbf Q}  \equiv \{ a/b: a, b \in {\mathbf Z}, b \ne 0\}$: the rational numbers.


$B$ is a _subset_ of $A$, written $B \subset A$ or $A \supset B$, if every element
of the set $B$ is also an element of the set $A$.
Thus ${\mathbf N}  \subset {\mathbf Z}  \subset {\mathbf Q}  \subset \Re \subset {\mathbf C}$.
The empty set $\emptyset$ is a subset of every set.
If $A \subset B$ and $B \subset A$, $A$ and $B$ are the same set, and we write $A = B$.
If $B$ is not a subset of $A$, we write $B \not \subset A$ or $A \not \supset B$.
$B$ is a _proper subset_ of $A$ if $B \subset A$ but
$A \not \subset B$.

The _complement_ of $A$ (with respect to the universe ${\mathcal X}$), written $A^c$ or $A'$,
is the set of all objects under consideration (${\mathcal X}$) that are not elements of $A$.
That is, $A^c \equiv \{ a \in {\mathcal X} : a \not \in A\}$.

The _intersection_ of $A$ and $B$, written $A \cap B$ or $AB$, is the set of all objects that
are elements of both $A$ and $B$:
\begin{equation*} 
A \cap B \equiv \{a: a \in A \mbox{ and } a \in B \}.
\end{equation*}
If $A \cap B = \emptyset$, we say $A$ and $B$ are _disjoint_ or _mutually
exclusive_.

The _union_ of $A$ and $B$, written $A \cup B$, is the set of all objects that
are elements of $A$ or of $B$ (or both):
\begin{equation*} 
A \cup B \equiv \{a: a \in A \mbox{ or } a \in B \mbox{ or both} \}.
\end{equation*}

The _difference_ or _set difference_ of $A$ and $B$, $A \setminus B$, pronounced "$A$ minus $B$," is
the set of all elements of $A$ that are not elements of $B$:
\begin{equation*} 
A \setminus B \equiv \{a \in A : a \not \in B \} = A \cap B^c.
\end{equation*}

_Intervals_ are special subsets of ${\mathbf R}$:
\begin{equation*} 
[a, b] \equiv \{x \in \Re : a \le x \le b\}
\end{equation*}
\begin{equation*} 
(a, b] \equiv \{x \in \Re : a < x \le b\}
\end{equation*}
\begin{equation*} 
[a, b) \equiv \{x \in \Re : a \le x < b\}
\end{equation*}
\begin{equation*} 
(a, b) \equiv \{x \in \Re : a < x < b\}.
\end{equation*}
Sometimes we have a collection of sets, _indexed_ by elements of another
set: $\{A_\beta : \beta \in B \}$.
Then $B$ is called an _index set_.
If $B$ is a subset of the integers ${\mathbf Z}$, usually we write $A_i$ or $A_j$, etc.,
rather than $A_\beta$.
If $B = {\mathbf N}$, we usually write $\{A_j\}_{j=1}^\infty$ rather than
$\{A_\beta : \beta \in {\mathbf N}  \}$.
\begin{equation*} 
\bigcap_{\beta \in B} A_\beta \equiv \{a: a \in A_\beta \;\;\forall \beta \in B \}.
\end{equation*}
($\forall$ means "for all.")
If $B = \{1, 2, \ldots, n\}$, we usually write $\bigcap_{j=1}^n A_j$ rather than
$\bigcap_{j \in \{1, 2, \ldots, n\}} A_j$.
The notation $\bigcup_{\beta \in B} A_\beta$ and $\bigcup_{j=1}^n A_j$ are defined analogously.

A collection of sets $\{A_\beta : \beta \in B \}$ is _pairwise disjoint_
if $A_\beta \cap A_{\beta'} = \emptyset$ whenever $\beta \ne \beta'$.
The collection $\{A_\beta : \beta \in B\}$ _exhausts_ or _covers_
the set $A$ if $A \subset \bigcup_{\beta \in B} A_\beta$.
The collection $\{A_\beta : \beta \in B \}$ is a _partition_ of the
set $A$ if $A = \cup_{\beta \in B} A_\beta$ and the 
sets $\{A_\beta : \beta \in B \}$
are pairwise disjoint.
If $\{A_\beta : \beta \in B \}$ are pairwise disjoint and exhaust $A$, then
$\{A_\beta \cap A : \beta \in B \}$ is a partition of $A$.

A set is _countable_ if its elements can be put in one-to-one correspondence with
a subset of ${\mathbf N}$.
A set is _finite_ if its elements can be put in one-to-one correspondence with
$\{1, 2, \ldots, n\}$ for some $n \in {\mathbf N}$.
If a set is not finite, it is infinite.
${\mathbf N}$, ${\mathbf Z}$, and ${\mathbf Q}$ are infinite and countable
(_countably infinite_);
${\mathbf R}$ is infinite and uncountable.

The notation $\# A$, pronounced "the cardinality of $A$" is the size of the set $A$, suitably defined.
If $A$ is finite, $\# A$ is the number of elements in $A$.
If $A$ is not finite but $A$ is countable (if its elements can be put in one-to-one
correspondence with the elements of ${\mathbf N}$), then $\# A = \aleph_0$ (aleph-null).
If the elements of $A$ can be put in one-to-one correspondence with the elements of $\mathbf{R}$, then
$\#A = c$, _the cardinality of the continuum_.

The _power set_ of a set $A$, denoted $2^A$, is the set of all subsets of the set $A$.
For example, the power set of $\{a, b, c\}$ is
\begin{equation*} 
\{ \emptyset, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\}, \{b, c\}, \{a, b, c\} \}.
\end{equation*}
If $A$ is a finite set, the cardinality of the power set of $A$ is $2^{\# A}$.
This can be seen as follows: suppose $\# A = n$ is finite.
Consider the elements of $A$ to be written in some canonical order.
We can specify an element of the power set by an $n$-digit binary number.
The first digit is 1 if the first element of $A$ is in the subset, and 0 otherwise.
The second digit is 1 if the second element of $A$ is in the subset, and 0 otherwise,
etc.
There are $2^n$ $n$-digit binary numbers, so there are $2^n$ subsets.
The cardinality of the power set of ${\mathbf N}$ is not $\aleph_0$.

If $A$ is a finite set, $B$ is a countable set
and $\{A_j : \beta \in B \}$ is a partition of $A$, then
\begin{equation*} 
\# A = \sum_{\beta \in B} \# A_\beta.
\end{equation*}

Suppose $A = A_1 \cup A_2$, but that possibly $A_1 A_2 \ne \emptyset$,
so $\{A_1, A_2\}$ might not be a partition of $A$, because
$A_1$ and $A_2$ are not necessarily disjoint.
Then still 
\begin{equation*} \#A = \#A_1 + \#A_2 - \#A_1A_2.\end{equation*}

This is seen most easily using a Venn diagram, and can be proved
by constructing a partition of $A$,
$A = A_1A_2^c \cup A_1^cA_2 \cup A_1A_2$, and noting that
$\#A_1 + \#A_2 = \#A_1A_2^c + \#A_1^cA_2 + 2\#A_1A_2$.

If 
$A = A_1 \cup A_2 \cup A_3$ but $\{A_1, A_2, A_3\}$ are not necessarily disjoint,
then still

\begin{equation*} \#A = \#A_1 + \#A_2 + \#A_3 - \#A_1A_2 - \#A_1A_3 - \#A_2A_3 + \#A_1A_2A_3.\end{equation*}

More generally, if $A = \cup_{i=1}^n A_i$, then the _Inclusion-Exclusion Principle_ holds:

\begin{equation*}  \#A = \sum_{i=1}^n \#A_i - \sum_{1 \le i_1 < i_2 \le n} \#(A_{i_1}A_{i_2}) +
\sum_{1 \le i_1 < i_2 < i_3 \le n} \#(A_{i_1}A_{i_2}A_{i_3}) - \cdots 
+(-1)^{k-1} \sum_{1 \le i_1 < i_2 < \cdots < i_k \le n} \# (A_{i_1}A_{i_2} \cdots A_{i_k}) + \cdots
\end{equation*}

### Some useful set relations

+ If $A\subset B$ and $B\subset A$ then $A = B$. (If two sets are subsets of each other, they are the same set.)

+ $\emptyset \subset A$. (The empty set is a subset of every set.)

+ $\emptyset \cup A = A$.
(The union of the empty set and any other set is that set.)

+ $\emptyset \cap A = \emptyset$.
(The intersection of the empty set and any other set is empty.)

+ If $A \subset B$ and $B \subset C$ then $A \subset C$. (The subset relation is transitive.)

+ If $A \subset B$ then $B^c \subset A^c$.
(Complementation reverses the subset relation.)

+ $A \cap B \subset  A$.  Moreover, $A\cap B = A$ if and only if $A \subset B$.

+ $A \subset  A\cup B$.   Moreover, $A = A\cup B$ if and only if $B \subset A$.

+ $(A\cap B)^c = A^c \cup B^c$. (de Morgan)

+ $(A\cup B)^c = A^c\cap B^c$. (de Morgan)

+ $A \cap B = B \cap A$. (Intersection is commutative.)

+ $A\cup B = B\cup A$. (Union is commutative.)

+ $A\cap (B\cap C) = (A\cap B)\cap C$. (Intersection is associative.)

+ $A\cup (B\cup C) = (A\cup B)\cup C$. (Union is associative.)

+ $A\cap (B\cup C) = (A\cap B)\cup (A\cap C)$. (Distribution of intersection over union.)

+ $A\cup (B\cap C) = (A\cup B)\cap (A\cup C)$. (Distribution of union over intersection.)

### Exercises

1. Prove that if $\{A_\beta : \beta \in B \}$ are pairwise disjoint and exhaust $A$, then $\{A_\beta \cap A : \beta \in B \}$ is a partition of $A$.

2. Prove de Morgan's identity $(A\cup B)^c = A^c\cap B^c$.

3. Prove that $\{A_1A_2^c, A_1^cA_2, A_1A_2\}$ is a partition of $A_1 \cup A_2$.

In [1]:
# boilerplate
%matplotlib inline
from __future__ import division
import math
import numpy as np
import scipy as sp
from scipy import special
import matplotlib.pyplot as plt

In [33]:
# Sets in Python
# Make three sets, A = {1, 2, 3}, B = {"a", "b", "green", 3}, C = {1, 2}
A = [1, 2, 3, 3] # this is a list but not a set, because 3 is listed twice
print 'A: ', A
A = set(A)  # this is a set containing the unique elements of A
print 'A as a set (duplicates removed):',A
B = set(["a", "b", "green", 3])
print 'B:', B
C = set(range(1,3)) # a different construction
print 'C:', C
#
# Set membership
print '1 is in A:', 1 in A  # is 1 an element of A?
print '"green" is in A:',"green" in A
print '"green" is in B:',"green" in B
print 'A is a subset of C:', A.issubset(C)
print 'C is a subset of A:', C.issubset(A)
print 'A intersect B:', A.intersection(B)

A:  [1, 2, 3, 3]
A as a set (duplicates removed): set([1, 2, 3])
B: set(['a', 3, 'b', 'green'])
C: set([1, 2])
1 is in A: True
"green" is in A: False
"green" is in B: True
A is a subset of C: False
C is a subset of A: True
A intersect B: set([3])


## Cartesian Products
The notation $(s_1, s_2, \ldots, s_n) \equiv (s_j)_{j=1}^n$ denotes an
_ordered $n$-tuple_ consisting of
$s_1$ in the first position, $s_2$ in the second position, etc.
The parentheses are used instead of curly braces to distinguish
$n$-tuples from sets: $(s_j)_{j=1}^n \ne \{s_j\}_{j=1}^n$.
The $k$th
_component_ of the $n$-tuple $s = (s_j)_{j=1}^n$, is $s_k$,
$k = 1, 2, \ldots, n$.
Two $n$-tuples are equal if their components are equal.
That is, $(s_j)_{j=1}^n = (t_j)_{j=1}^n$ means that
$s_j = t_j$ for $j = 1, \ldots, n$.
In particular, $(s, t) \ne (t, s)$ unless $s=t$.
In contrast, $\{s, t \} = \{ t, s \}$ always.

The _Cartesian product of $S$ and $T$_ is
$S \bigotimes T \equiv \{(s, t): s \in S \mbox{ and } t \in T\}$.
Unless $S = T$, $S \bigotimes T \ne T \bigotimes S$.
${\mathbf R}^n$ is the Cartesian product of ${\mathbf R}$ with itself, $n$ times; its elements
are $n$-tuples of real numbers.
If $s$ is the $n$-tuple $(s_1, s_2, \ldots, s_n) = (s_j)_{j=1}^n$
and $t$ is the $n$-tuple $(t_1, t_2, \ldots, t_n) = (t_j)_{j=1}^n$,
then $s = t$ iff $s_j = t_j$ for all $j=1, \ldots, n$.

The Python library <tt>itertools</tt> has utilities for generating Cartesian products, power sets, permutations, and the like.

In [3]:
# Cartesian products using itertools
import itertools
A = ['a','b','c','d']
B = range(3)
print [i for i in itertools.product(A,B)]

[('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2), ('c', 0), ('c', 1), ('c', 2), ('d', 0), ('d', 1), ('d', 2)]


## Combinatorics
Let $A$ be a finite set with $\# A = n$.
A _permutation_ of (the elements of) $A$ is an element $s$ of
$\bigotimes_{j=1}^n A = A^n$
whose components are distinct elements of $A$.
That is, $s = (s_j)_{j=1}^n \in A^n$ is a permutation of $A$ if
$\# \{s_j\}_{j=1}^n = n$.
There are $n! = n(n-1)\cdots 1$ permutations of a set with $n$ elements:
there are $n$ choices for the first component of the permutation, $n-1$ choices for
the second (whatever the
first might be), $n-2$ for the third (whatever the first two might be), etc.

This is an illustration of the _Fundamental Rule of Counting_:
If specifying each (unique) element of a set can be expressed as a series $n$ of choices, and the number of options at step $i$ of the series is $n_i$ regardless of the choices made before step $i$, then the total number of elements of the set is $\prod_{i=1}^n n_i$. 

The number of permutations of $n$ things taken $k$ at a time, ${}_nP_k$,
is the number of ways there are of selecting $k$ of $n$ things, then permuting
those $k$ things.
There are $n$ choices for the object that will be in the first place in the permutation,
$n-1$ for the second place (regardless of which is first), etc., and $n-k+1$ choices
for the item that will be in the $k$th place.
By the fundamental rule of counting, it follows that
${}_nP_k = n(n-1)\cdots(n-k+1) = n!/(n-k)!$.

The number of subsets of size $k$ that can be formed from $n$ objects is
\begin{equation*} 
{}_nC_k = {{n}\choose{k}} =
{}_nP_k/k! = n(n-1)\cdots(n-k+1)/k! = \frac{n!}{k!(n-k)!},
\end{equation*}
since each subset of size $k$ can be ordered into $k!$ permutations.

Here are some useful identities:

1. $ {{n}\choose{k}} = {{n}\choose{n-k}} $.
1. $ {{n}\choose{0}} = {{n}\choose{n}} = 1$.
1. $ {{n}\choose{1}} = {{n}\choose{n-1}} = n $.
1. $ {{n} \choose {k}} = \prod_{m=0}^{k-1} \frac{n-m}{m+1} $.
1. $ {{n} \choose {k}} = {{n-1}\choose{k-1}} + {{n-1} \choose {k}}$.

Because the power set of a set with $n$ elements can be partitioned as

\begin{equation*} 
\cup_{k=0}^n \left \{ \mbox{all distinct subsets of size $k$}
\right \},
\end{equation*}

and since the power set has $2^n$ elements, it follows that
\begin{equation*} 
\sum_{j=0}^n {{n} \choose {j}} = 2^n.
\end{equation*}

---
### Exercise
Prove the five identities about binomial coefficients given above.

---

In [34]:
# Combinatorics in Python

n = 5
k = 3
print 'n!:',math.factorial(n)

# number of combinations of n objects taken k at a time
print 'nCk:',sp.special.binom(n,k)

# number of permutations of n objects taken k at a time

def permu(n, k):
    permu = 1
    s = 0
    while (s < k):
        permu = permu*(n-s)
        s = s+1
    return(permu)

print 'nPk:',permu(n, k)

n!: 120
nCk: 10.0
nPk: 60


In [5]:
# enumerate combinations and permutations using itertools
A = ['a','b','c','d','e']
print 'A:', A
k = 2
print 'all combinations of', k, 'elements of A:', [i for i in itertools.combinations(A,k)]
print 'all permutations of', k, 'elements of A:', [i for i in itertools.permutations(A,k)]

A: ['a', 'b', 'c', 'd', 'e']
combinations of  2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'd'), ('c', 'e'), ('d', 'e')]
permutations of  2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'a'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'a'), ('c', 'b'), ('c', 'd'), ('c', 'e'), ('d', 'a'), ('d', 'b'), ('d', 'c'), ('d', 'e'), ('e', 'a'), ('e', 'b'), ('e', 'c'), ('e', 'd')]


## Strategies for counting

There are several approaches to counting the elements of a set that can be helpful in practice, aside from simply enumerating the elements.

1. Divide and conquer: partition the set.
1. Use the Fundamental Rule of Counting. (This is how we found the number of permutations of $k$ of $n$ things.)
1. Overcount: count each item $k$ times, then divide by $k$. (This is how we derived the number of combinations of $k$ of $n$ things from the number of permutations of $k$ of $n$ things: each combination of $k$ things occurs $k!$ times among the set of permutations.)
1. Overinclude, then adjust: count a set that is larger than the set in question, then subtract the extras.
1. Use the Inclusion-Exclusion Principle.

---
### Computational exercise

Write a Python function that constructs all permutations of a finite list.

The function should take as input a list s, and produce as output all permutations of the elements of the list. Items should be considered unique based on their position in the list.

For instance,

    s = ['a','b','c']
    permute(s)

should return the 6 permutations of a, b, and c:

    a b c
    a c b
    b a c
    b c a
    c a b
    c b a

Write your function from scratch; i.e., do not use any libraries that are not part of the core of Python (e.g., do not use the <span class="code">itertools</span> library).

---

## Numerical stability and order of operations

As a matter of mathematics, ${{n}\choose{k}} = \frac{n!}{k!(n-k)!}$, but this is a bad way to compute ${{n}\choose{k}}$.
When $n$ is large, the numerator $n!$ is enormous, and can overflow finite-precision calculations even for values of $n$ and $k$ for which ${{n}\choose{k}}$ is small.
Moreover, relying on cancellation in finite-precision arithmetic can lead to large errors, whether the cancellation is through division or subtraction.

So, rather than compute ${{n}\choose{k}}$ by dividing factorials, it's better to build it by multiplying ratios.

In [35]:
# illustrating issues calculating nCk as a ratio of factorials for large n
n = 1000
k = 2
print '1000!:', math.factorial(n) # Really Big Number: 
                                  # would overflow in many languages, 
                                  # including R, MATLAB, Fortran
# however, 1000 choose 2 is only (1000*999)/2 = 499500
print '\n998!:', math.factorial(n-k)
print '\n1000C2:', sp.special.binom(n, k)

# let's code this from scratch in a stable way
def myChoose(n, k):
    # no error checking. If this were for production, we'd check that the arguments are integers
    # and that n >= k >= 0.
    m = 0
    myc = 1
    while (m < k):
        myc = myc * (n-m)/(m+1)
        m = m+1
    return(myc)

print '1000C2 (2nd way):', myChoose(n, k)

1000!: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838

A related same issue comes up in subtracting large numbers.

For instance, algebraically, $x^2 - (x-1)^2 = 2x-1$. 
But if the mantissa of $x$ is large, squaring
$x$ and $x-1$ can overflow the precision of the calculation, and their difference of the computed squares will be numerically wrong.

## Connecting Probability to Set Theory

A _random experiment_ or _random trial_
is basically any
situation whose outcome is not perfectly predictable, but for which we
can specify all possible outcomes, and that shows long-term regularities.
For example, when we toss a coin, we do not know how it will land,
but it certainly must land heads, tails, on its edge, or not land at all.
There is no other possibility.
The set of all possible outcomes of a random experiment is called the
_outcome space_.
The letter $\mathcal{S}$ will denote outcome space.
We are free to choose the outcome space to correspond to what we deem
relevant for the experiment, as long as it is essentially inevitable that the
random experiment will result in some outcome in the outcome space.
For example, the outcome space we just described was {heads, tails, edge, doesn't land}.
It might be adequate for our purposes for the outcome space to be {heads, not heads}.

Often we shall tailor outcome spaces for specific problems.
Here is an example: Imagine a box containing tickets that are indistinguishable
except that each has written upon it a unique number between 1 and the number
of tickets, $n$.
We shake the box, draw a ticket from the box without looking, and record
the number written on the ticket we happened to draw.
The natural outcome space of this experiment is the set of numbers
$\{1, 2, \ldots, n\}$.
However, suppose we are interested only in whether the number on the
ticket we draw is even.
The outcome space then could be reduced to {even number on ticket, odd number on ticket},
or coded even more abstractly as $\{0, 1\}$, 
where the outcome is the number of even-numbered tickets drawn.

An _event_ is a subset of outcome space: a collection of outcomes in the
outcome space.
$A$ is an event if $A \subset \mathcal{S}$.
For example, in the experiment of drawing a numbered ticket from the box,
suppose there are 10 tickets in all, and that we choose the outcome space
$\mathcal{S}$ to be the numbers
$\{1, 2, 3, \ldots, 9, 10\}$.
Then &quot;we draw the number 1&quot; is the event $\{1\}$, and
&quot;we draw an even number&quot; is the event $\{2, 4, 6, 8, 10\}$,
both of which are subsets of the set of possible outcomes, the outcome space
$\mathcal{S}$.

The treatment is quite informal, omitting important technical concepts such as sigma-algebras,
measurability, and the like.

Modern probability theory starts with Kolmogorov's Axioms; here is an informal startement of the axioms.
For more (but still a very informal treatment), see: 
[Probability: Philosophy and Mathematical Background](http://www.stat.berkeley.edu/~stark/SticiGui/Text/probabilityPhilosophy.htm),
[Set theory](http://www.stat.berkeley.edu/~stark/SticiGui/Text/sets.htm),
and
[Probability: Axioms and Fundaments](http://www.stat.berkeley.edu/~stark/SticiGui/Text/probabilityAxioms.htm).

Let $\mathcal{S}$ denote the _outcome space_, the set of all possible outcomes of a random experiment, 
and let $\{A_i\}_{i=1}^\infty$ be a countable collection of 
subsets of $\mathcal{S}$.
Then any probability function ${\mathbb P}$ must satisfy these axioms:

1. For every $A \subset \mathcal{S}$, ${\mathbb P}(A) \ge 0$ (probabilities are nonnegative)
2. ${\mathbb P}(\mathcal{S}) = 1$ (the chance that _something_ happens is 100%)
3. If $A_i \cap A_j = \emptyset$ for $i \ne j$, then ${\mathbb P} \cup_{i=1}^\infty A_i = \sum_{i=1}^\infty {\mathbb P}(A_i)$
(If a countable collection of events is _pairwise disjoint_, then the chance that any of the
events occurs is the sum of the chances that they occur individually.)

These axioms have many useful consequences, among them that ${\mathbb P}(\emptyset) = 0$, ${\mathbb P}(A^c) = 1 - {\mathbb P}(A)$,
and ${\mathbb P}(A \cup B) = {\mathbb P}(A) + {\mathbb P}(B) - {\mathbb P}(AB)$.
More generally, the Inclusion-Exclusion Principle gives us

\begin{equation*} 
\mathbb{P} \cup_{i=1}^n A_i  = \sum_{i=1}^n \mathbb{P} A_i 
- \sum_{1 \le i_1 < i_2 \le n} \mathbb{P}(A_{i_1}A_{i_2}) +
\sum_{1 \le i_1 < i_2 < i_3 \le n} \mathbb{P}(A_{i_1}A_{i_2}A_{i_3}) - \cdots 
+(-1)^{k-1} \sum_{1 \le i_1 < i_2 < \cdots < i_k \le n} \mathbb{P} (A_{i_1}A_{i_2} \cdots A_{i_k}) + \cdots
\end{equation*}
\begin{equation*} 
= \sum_{i=1}^n \left((-1)^{i-1}\sum_{\mathcal{I} \subset\{1,\ldots,n\}: \#\mathcal I =i} \mathbb{P}(\cap_{j \in \mathcal{I}} A_j \right).
\end{equation*}

### Definitions

Let $A$ and $B$ be subsets of outcome space $\mathcal{S}$.

+ If $AB = \emptyset$, then $A$ and $B$ are _mutually exclusive_.
+ If $A \subset B$ then $A$ _implies_ $B$:  if $A$ occurs, $B$ must
also occur (if the outcome that occurs is in $A$, the outcome
that occurs is also in $B$, because every element of $A$
is an element of $B$).
+ If ${\mathbb P}(AB) = {\mathbb P}(A){\mathbb P}(B)$, then $A$ and $B$ are _independent_.
+ If ${\mathbb P}(B) > 0$, then the _conditional probability of $A$ given $B$_ is
${\mathbb P}(A | B) \equiv {\mathbb P}(AB)/{\mathbb P}(B)$.
---

Independence is an extremely specific relationship.
At one extreme, $AB = \emptyset$; then ${\mathbb P}(AB) = 0 \le {\mathbb P}(A){\mathbb P}(B)$. 
At another extreme, either $A$ is a subset of $B$ or vice versa; then ${\mathbb P}(AB) = \min({\mathbb P}(A),{\mathbb P}(B)) \ge {\mathbb P}(A){\mathbb P}(B)$.
Independence lies at a precise point in between.

### Exercises.

1. Show that if $A$ and $B$ are independent and $\mathbb{P}(B) >0$, then
$\mathbb{P}(A|B) = \mathbb{P}(A)$.
1. Show that conditional probability behaves like "ordinary" probability: suppose $\mathbb{P}(B) > 0$ and $\{A_i\}$ are such that $\{A_i \cap B\}_{i=1}^\infty$ are pairwise disjoint. Then 
   - $\mathbb{P}(A | B) \ge 0$
   - $\mathbb{P}(B | B) = 1$
   - $\mathbb{P}(\cup_{i=1}^\infty A_i | B) = \sum_{i=1}^\infty \mathbb{P} (A_i | B)$

## Bayes' Rule

Suppose $\mathbb{P}(A)$, $\mathbb{P}(B) \ne 0$.
Then

\begin{equation*} 
\mathbb{P}(A|B) = \frac{\mathbb{P}(B|A)\mathbb{P}(A)}{\mathbb{P}(B)}.
\end{equation*}

Bayes' rule lets you "invert" conditional probabilities to find $\mathbb{P}(A|B)$ in terms of $\mathbb{P}(B|A)$.


## The Law of Total Probability

Let $A$ be an event and suppose $\{ A_i \}$ (a finite or countable collection of sets) is a partition of outcome space $\mathcal{S}$ such that $\mathbb{P}(A_i) > 0$ $\forall i$.
Then

\begin{equation*} 
\mathbb{P}(A) = \sum_i \mathbb{P}(A|A_i) \mathbb{P}(A_i).
\end{equation*}

The Law of Total Probability is basically just an application of the definition of conditional probability and Kolmogorov's third axiom, but it's extremely useful.

### Examples.

Here is a classic application of Bayes' Rule. Suppose that a screening test for a medical condition
is 95% accurate, in the sense that if one has the condition, there is a 95% chance that the test will be positive, and if one does not have the condition, there is a 95% chance that the test will be negative.
Suppose that 1% of the population actually has the condition.
We select someone at random from the population and test him or her.
The test result is positive.
What's the chance that the person actually has the condition?

\begin{equation*}  \mathbb{P}(\mbox{has condition}|\mbox{tests positive})
= \frac{\mathbb{P}(\mbox{tests positive }|\mbox{ has condition}) \mathbb{P}(\mbox{has condition })}{\mathbb{P}(\mbox{tests positive})}
=
\frac{\mathbb{P}(\mbox{tests positive }|\mbox{ has condition}) 
\mathbb{P}(\mbox{has condition})}
{\mathbb{P}(\mbox{tests positive } | \mbox{ has condition})\mathbb{P}(\mbox{has condition}) + \mathbb{P}(\mbox{tests positive } | \mbox{ doesn't have condition})\mathbb{P}(\mbox{doesn't have condition})}
=
\frac{0.95}{0.95 \times 0.01 + 0.05 \times 0.99}
= 16.1\%.
\end{equation*}
(The denominator involves applying the Law of Total Probability.)

### Exercises.
[To do]