Mathematical Preliminaries¶

Sets¶

A set is a collection of objects, called members or elements of the set, without regard for their order. \(a \in A\), pronounced “\(a\) is an element of \(A\),” “\(a\) is in \(A\),” or “\(a\) is a member of \(A\)” means that \(a\) is an element of the set \(A\). This is the same as writing \(A \ni a\), which is pronounced “\(A\) contains \(a\).” If \(a\) is not an element of \(A\), we write \(a \not \in A\). Sets may be described explicitly by listing their contents, or implicitly by specifying a property that all elements of the set share, or a condition that they satisfy. The contents of sets are enclosed in curly braces: \(\{ \}\). Examples:

\(A = \{a, b, c, d \}\): the set containing the four elements \(a\), \(b\), \(c\), and \(d\).
\(\emptyset = \{ \}\): the empty set, the set that contains no elements.
\({\mathbb Z} \equiv \{ \ldots, -2, -1, 0, 1, 2, \ldots \}\): the integers.
\({\mathbb N} \equiv \{1, 2, 3, \ldots \}\): the natural (counting) numbers.
\({\mathbb R} \equiv (-\infty, \infty)\): the real numbers.
\({\mathbb R}^+ \equiv [-\infty, \infty]\): the extended real numbers.
\({\mathbb C} \equiv \{ a + bi: a, b \in {\mathbb R} \}\), where \(i = \sqrt{-1}\): the complex numbers.
\({\mathbb Q} \equiv \{ a/b: a, b \in {\mathbb Z}\}\): the rational numbers.

\(B\) is a _ subset_ of \(A\), written \(B \subset A\) or \(A \supset B\), if every element of the set \(B\) is also an element of the set \(A\). Thus \({\mathbb N} \subset {\mathbb Z} \subset {\mathbb Q} \subset {\mathbb R} \subset {\mathbb C}\). The empty set \(\emptyset\) is a subset of every set. If \(A \subset B\) and \(B \subset A\), \(A\) and \(B\) are the same set, and we write \(A = B\). If \(B\) is not a subset of \(A\), we write \(B \not \subset A\) or \(A \not \supset B\). \(B\) is a proper subset of \(A\) if \(B \subset A\) but \(A \not \subset B\).

The complement of \(A\) (with respect to the universe \(\mathcal{X}\)), written \(A^c\) or \(A'\), is the set of all objects under consideration (\(\mathcal{X}\)) that are not elements of \(A\). That is, \(A^c \equiv \{ a \in \mathcal{X} : a \not \in A\}\).

The intersection of \(A\) and \(B\), written \(A \cap B\) or \(AB\), is the set of all objects that are elements of both \(A\) and \(B\):

\[\begin{equation*} A \cap B \equiv \{a: a \in A \mbox{ and } a \in B \}. \end{equation*}\]

If \(A \cap B = \emptyset\), we say \(A\) and \(B\) are disjoint or mutually exclusive.

The union of \(A\) and \(B\), written \(A \cup B\), is the set of all objects that are elements of \(A\) or of \(B\) (or both):

\[\begin{equation*} A \cup B \equiv \{a: a \in A \mbox{ or } a \in B \mbox{ or both} \}. \end{equation*}\]

The difference of \(A\) and \(B\), \(A \setminus B\), pronounced “\(A\) minus \(B\),” is the set of all elements of \(A\) that are not elements of \(B\):

\[\begin{equation*} A \setminus B \equiv \{a \in A : a \not \in B \} = A \cap B^c. \end{equation*}\]

Intervals are special subsets of \({\mathbb R}\):

\[\begin{eqnarray*} [a, b] &\equiv& \{x \in {\mathbb R} : a \le x \le b\}\cr (a, b] &\equiv& \{x \in {\mathbb R} : a < x \le b\}\cr [a, b) &\equiv& \{x \in {\mathbb R} : a \le x < b\}\cr (a, b) &\equiv& \{x \in {\mathbb R} : a < x < b\}. \end{eqnarray*}\]

Sometimes we have a collection of sets, indexed by elements of another set: \(\{A_\beta : \beta \in B \}\). Then \(B\) is called an index set. If \(B\) is a subset of the integers \({\mathbb Z}\), usually we write \(A_i\) or \(A_j\), etc., rather than \(A_\beta\). If \(B = {\mathbb N}\), we usually write \(\{A_j\}_{j=1}^\infty\) rather than \(\{A_\beta : \beta \in {\mathbb N} \}\).

\[\begin{equation*} \bigcap_{\beta \in B} A_\beta \equiv \{a: a \in A_\beta \;\;\forall \beta \in B \}. \end{equation*}\]

(\(\forall\) means “for all.”) If \(B = \{1, 2, \ldots, n\}\), we usually write \(\bigcap_{j=1}^n A_j\) rather than \(\bigcap_{j \in \{1, 2, \ldots, n\}} A_j\). The notation \(\bigcup_{\beta \in B} A_\beta\) and \(\bigcup_{j=1}^n A_j\) are defined analogously.

A collection of sets \(\{A_\beta : \beta \in B \}\) is pairwise disjoint if \(A_\beta \cap A_{\beta'} = \emptyset\) whenever \(\beta \ne \beta'\). The collection \(\{A_\beta : \beta \in B\}\) exhausts or covers the set \(A\) if \(A \subset \bigcup_{\beta \in B} A_\beta\). The collection \(\{A_\beta : \beta \in B \}\) is a partition of the set \(A\) if \(A = \cup_{\beta \in B} A_\beta\) and the sets \(\{A_\beta : \beta \in B \}\) are pairwise disjoint. If \(\{A_\beta : \beta \in B \}\) are pairwise disjoint and exhaust \(A\), then \(\{A_\beta \cap A : \beta \in B \}\) is a partition of \(A\).

A set is countable if its elements can be put in one-to-one correspondence with a subset of \({\mathbb N}\). A set is finite if its elements can be put in one-to-one correspondence with \(\{1, 2, \ldots, n\}\) for some \(n \in {\mathbb N}\). If a set is not finite, it is infinite. \({\mathbb N}\), \({\mathbb Z}\), and \({\mathbb Q}\) are infinite but countable; \({\mathbb R}\) is infinite and uncountable.

The notation \(\# A\), pronounced “the cardinality of \(A\)” is the size of the set \(A\). If \(A\) is finite, \(\# A\) is the number of elements in \(A\). If \(A\) is not finite but \(A\) is countable (if its elements can be put in one-to-one correspondence with the elements of \({\mathbb N}\)), then \(\# A = \aleph_0\) (aleph-null).

The power set of a set \(A\) is the set of all subsets of the set \(A\). For example, the power set of \(\{a, b, c\}\) is

\[\begin{equation*} \{ \emptyset, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\}, \{b, c\}, \{a, b, c\} \}. \end{equation*}\]

If \(A\) is a finite set, the cardinality of the power set of \(A\) is \(2^{\# A}\). This can be seen as follows: suppose \(\# A = n\) is finite. Consider the elements of \(A\) to be written in some canonical order. We can specify an element of the power set by an \(n\)-digit binary number. The first digit is 1 if the first element of \(A\) is in the subset, and 0 otherwise. The second digit is 1 if the second element of \(A\) is in the subset, and 0 otherwise, etc. There are \(2^n\) \(n\)-digit binary numbers, so there are \(2^n\) subsets. The cardinality of the power set of \({\mathbb N}\) is not \(\aleph_0\).

If \(A\) is a finite set, \(B\) is a countable set and \(\{A_j : \beta \in B \}\) is a partition of \(A\), then

\[\begin{equation*} \# A = \sum_{\beta \in B} \# A_\beta. \end{equation*}\]

Cartesian Products¶

The notation \((s_1, s_2, \ldots, s_n) \equiv (s_j)_{j=1}^n\) denotes an ordered \(n\)-tuple consisting of \(s_1\) in the first position, \(s_2\) in the second position, etc. The parentheses are used instead of curly braces to distinguish \(n\)-tuples from sets: \((s_j)_{j=1}^n \ne \{s_j\}_{j=1}^n\). The \(k\)th component of the \(n\)-tuple \(s = (s_j)_{j=1}^n\), is \(s_k\), \(k = 1, 2, \ldots, n\). Two \(n\)-tuples are equal if their components are equal. That is, \((s_j)_{j=1}^n = (t_j)_{j=1}^n\) means that \(s_j = t_j\) for \(j = 1, \ldots, n\). In particular, \((s, t) \ne (t, s)\) unless \(s=t\). In contrast, \(\{s, t \} = \{ t, s \}\) always.

The Cartesian product of \(S\) and \(T\) is \(S \times T \equiv \{(s, t): s \in S \mbox{ and } t \in T\}\). Unless \(S = T\), \(S \times T \ne T \times S\). \({\mathbb R}^n\) is the Cartesian product of \({\mathbb R}\) with itself, \(n\) times; its elements are \(n\)-tuples of real numbers. If \(s\) is the \(n\)-tuple \((s_1, s_2, \ldots, s_n) = (s_j)_{j=1}^n\),

Let \(A\) be a finite set with \(\# A = n\). A permutation of (the elements of) \(A\) is an element \(s\) of \(\times_{j=1}^n A = A^n\) whose components are distinct elements of \(A\). That is, \(s = (s_j)_{j=1}^n \in A^n\) is a permutation of \(A\) if \(\# \{s_j\}_{j=1}^n = n\). There are \(n! = n(n-1)\cdots 1\) permutations of a set with \(n\) elements: there are \(n\) choices for the first component of the permutation, \(n-1\) choices for the second (whatever the first might be), \(n-2\) for the third (whatever the first two might be), etc. This is an illustration of the fundamental rule of counting: in a sequence of \(n\) choices, if there are \(m_1\) possibilites for the first choice, \(m_2\) possibilities for the second choice (no matter which was chosen in the first place), \(m_3\) possibilities for the third choice (no matter which were chosen in the first two places), and so on, then there are \(m_1 m_2 \cdots m_n = \prod_{j=1}^n m_j\) possible sequences of choices in all.

The number of permutations of \(n\) things taken \(k\) at a time, \({}_nP_k\), is the number of ways there are of selecting \(k\) of \(n\) things, then permuting those \(k\) things. There are \(n\) choices for the object that will be in the first place in the permutation, \(n-1\) for the second place (regardless of which is first), etc., and \(n-k+1\) choices for the item that will be in the \(k\)th place. By the fundamental rule of counting, it follows that \({}_nP_k = n(n-1)\cdots(n-k+1) = n!/(n-k)!\).

The number of subsets of size \(k\) that can be formed from \(n\) objects is

\[\begin{equation*} {}_nC_k = {{n}\choose{k}} = {}_nP_k/k! = n(n-1)\cdots(n-k+1)/k! = \frac{n!}{k!(n-k)!}. \end{equation*}\]

Because the power set of a set with \(n\) elements can be partitioned as

\[\begin{equation*} \cup_{k=0}^n \left \{ \mbox{all subsets of size } k \right \}, \end{equation*}\]

it follows that

\[\begin{equation*} \sum_{j=0}^n {}_nC_k = 2^n. \end{equation*}\]

Mappings and Functions¶

Functions are subsets of Cartesian products. We write \(f: \mathcal{X} \rightarrow \mathcal{Y}\), pronounced “\(f\) maps \(\mathcal{X}\) into \(\mathcal{Y}\)” or “\(f\) is a function with domain \(\mathcal{X}\) and co-domain \(\mathcal{Y}\)” if \(f \subset \mathcal{X} \times \mathcal{Y}\) such that for each \(x \in \mathcal{X}\), \(\exists 1 y \in \mathcal{Y}\) such that \((x, y) \in f\). (The notation \(\exists 1 y\) means that there exists exactly one value of \(y\).) The set \(\mathcal{X}\) is called the domain of \(f\) and \(\mathcal{Y}\) is called the co-domain of \(f\). If the \(\mathcal{X}\)-component of an element of \(f\) is \(x\), we denote the \(\mathcal{Y}\)-component of that element of \(f\) by \(fx\) or \(f(x)\), so that \((x, fx) \in f\); we write \(f: x \mapsto y=f(x)\). The functions \(f\) and \(g\) are equal if they are the same subset of \(\mathcal{X} \times \mathcal{Y}\), which means that they have the same domain \(\mathcal{X}\), and \(fx = gx\) \(\forall x \in \mathcal{X}\).

Let \(A \subset \mathcal{X}\). The image of \(A\) under \(f\) is

\[\begin{equation*} fA = f(A) \equiv \{ y \in \mathcal{Y} : (x, y) \in f \mbox{ for some } x \in A\}. \end{equation*}\]

More colloquially, we would write this as

\[\begin{equation*} fA = \{ y \in \mathcal{Y} : f(x) = y \mbox{ for some } x \in A \}. \end{equation*}\]

If \(f\mathcal{X}\) is a proper subset of \(\mathcal{Y}\), \(f\) is into. If \(f\mathcal{X} = \mathcal{Y}\), \(f\) is onto. For \(B \subset \mathcal{Y}\), the inverse image of \(B\) under \(f\) or pre-image of \(B\) under \(f\) is

\[\begin{equation*} f^{-1}B \equiv \{ x \in \mathcal{X} : fx \in B \}. \end{equation*}\]

Similarly, \(f^{-1}y \equiv \{ x \in \mathcal{X} : fx = y \}\) If \(\forall y \in \mathcal{Y}\), \(\# \{ f^{-1} y \} \le 1\), \(f\) is one-to-one (1:1). If \(f\) is one-to-one and onto, i.e., if \(\forall y \in \mathcal{Y}\), \(\# \{ f^{-1}y \} = 1\), \(f\) is a bijection.

Multisets¶

A multiset or bag is like a set, except it can contain more than one “copy” of a given element.

For instance, the sets \(\{0, 1\}\) and \(\{0, 1, 1\}\) are the same, because they contain the same two elements, namely, 0 and 1. But viewed as multisets, they are different, because the second contains the element 1 twice. One can think of a multiset as an ordered pair, the first element of which is itself the set of distinct elements of the multiset, and the second of which is a function that assigns a natural number to each element of the set of elements. For instance, the multiset \(\{0, 1, 1\}\) would be represented \( (\{0, 1\}, \{(0, 1), (1, 2) \})\).

Two multisets are equal if they contain the same elements with the same multiplicities, that is, set of distinct elements of the first multiset is equal to the set of distinct elements of the second, and the multiplicity function of the first is equal to the multiplicity function of the second (i.e., the multiplicity of each item is the same in both multisets). For instance, the multisets \(\{0, 1, 1\}\) and \(\{1, 0, 1\}\) are equal.

A set amounts to a multiset in which the multiplicity of every element is 1.

Generally I will not make a notational distinction between sets and multisets (I use curly braces to denote both sets and multisets).

If \(x_j\)_{j=1}^n is an ordered n-tuple, \(\{x_j\}_{j=1}^n\) will denote the multiset containing all \(n\) of its coordinates.

Relations and Orders¶

A relation \(R\) on the set \(A\) is a collection of ordered pairs of elements of \(A\), that is, \(R \subset A \times A\). Notationally, if \((a,b) \in R \), we write \(a R b\).

A relation can also be thought of as a function from \(A \times A\) to \(\{0, 1\}\) such that the function takes the value 1 if an only if (iff) \((a,b) \in R \).

For instance, consider the “divides” relation on the positive integers \(\mathbb N\). We say that \(j \in \mathbb N\) “divides” \(k \in \mathbb N\) if \(k\) is an integer multiple of \(j\). As an example, 3 divides 102. If we let \(R\) denote the relation “divides,” then \((3, 102) \in R \) and we would write \(3 R 102\)

A partial order \(\le \) on the set \(A\) is a relation on \(A\) such that:

for every \(a \in A\), \( a \le a \) (reflexivity)
if \( a \le b \) and \( b \le a \), then \( a = b \) (antisymmetry)
if \( a \le b \) and \( b \le c \), then \( a \le c \) (transitivity)

If in addition, for every \( (a, b) \in A \times A \) either \( a \le b \) or \( b \le a \) (comparability), then \( \le \) is a total order.

An equivalence relation \( \sim \) on the set \(A\) is a relation on \(A\) such that:

for every \(a \in A\), \( a \sim a \) (reflexivity)
if \( a \sim b \) then \( b \sim a \) (symmetry)
if \( a \sim b \) and \( b \sim c \), then \( a \sim c \) (transitivity)

The equivalence class of \(a\) under \(\sim\) is \([a] \equiv \{b \in A : b \sim a \}\).

Groups¶

A group is an ordered pair \((\mathcal{G}, \times)\), where \(\mathcal{G}\) is a collection of objects (the elements of the group) and \(\times\) is a mapping from \(\mathcal{G} \times \mathcal{G}\) onto \(\mathcal{G}\), \begin{eqnarray*} \times : & \mathcal{G} \times \mathcal{G} & \rightarrow \mathcal{G} \ & (g, h) & \mapsto g \times h, \end{eqnarray*} satisfying the following axioms:

\(\exists e \in \mathcal{G}\) s.t. \(\forall g \in \mathcal{G}\), \(e \times g = g\). The element \(e\) is called the identity.
For each \(g \in \mathcal{G}\), \(\exists g^{-1} \in \mathcal{G}\) s.t. \(g^{-1}\times g = e\). (Every element has an inverse.)
If \(f, g, h \in \mathcal{G}\), then \(f \times (g \times h) = (f \times g)\times h\). (The group operation is associative.)

If, in addition, for every \(g, h \in \mathcal{G}\), \(g \times h = h \times g\) (if the group operation commutes), we say that \((\mathcal{G}, \times)\) is an Abelian group or commutative group.

In an abuse of terminology, we will often call \(\mathcal{G}\) a group, with the group operation \(\times\) understood from context.

Examples of groups include the real numbers together with ordinary addition, \(({\mathbf R}, +)\); the real numbers other than zero together with ordinary multiplication, \(({\mathbf R} \setminus \{0\}, *)\); the rational numbers together with ordinary addition, \(({\mathbf Q}, +)\); and the integers 0 to \(p-1\), \(p\) prime, together with addition modulo \(p\), \(( \{0, 1, \ldots, p-1\}, +)\); and the set of all permutations of \(n\) distinct objects (called the symmetric group of degree \(n\), denoted \(\mathcal{S}_n\)).

It is common to omit the symbol \(\times\) from the group notation, and write \(g \times h\) as \(gh\), when \(\mathcal{G}\) is an abstract group.

Suppose \(\mathcal{H} \subset \mathcal{G}\), where \((\mathcal{G}, \times)\) is a group.
If \((\mathcal{H}, \times)\) is also a group, it is called a subgroup of \((\mathcal{G}, \times)\).

Consider a subset \(A\) of elements of a group \(\mathcal{G}\). The subgroup generated by \(A\) is the smallest group that contains \(A\) (using the same multiplication operator as \(\mathcal{G}\)), that is, every element of \(\mathcal{G}\) that can be written as a product of elements of \(A\) and inverses of elements of \(A\). If the subgroup generated by \(A\) is equal to \(\mathcal{G}\), then \(A\) is a generating set of \(\mathcal{G}\).

For example, consider \(\mathcal{S}_n\), the symmetric group of permutations of \(n\) objects. This group can be generated from just two of its elements, a transposition \(\pi = (2, 1, 3, \ldots, n)\) and a circular shift \(\eta = (n, 1, 2, 3, \ldots, n-1)\).

Fields¶

An ordered triple \((\mathcal{F}, \times, +)\) is a field if \(\mathcal{F}\) is a collection of objects and \(\times\) and \(+\) are operations on \(\mathcal{F} \times \mathcal{F}\) such that

\(\mathcal{F}\) is an Abelian group under the operation \(+\), with identity element denoted 0.
\(\mathcal{F} \setminus \{0\}\) is an Abelian group under the operation \(\times\), with identity element denoted \(1\).
\(\times\) is distributive over \(+\). I.e., for any \(f\), \(g\), \(h \in \mathcal{F}\), \(f \times (g+h) = f \times g + f \times h\) and \((f+g) \times h = f\times h + g \times h\).

The additive inverse of \(g\) is denoted \(-g\); the multiplicative inverse of \(g\) is \(g^{-1} = 1/g\).

Examples: \(({\mathbf R}, \times, +)\), where \(\times\) is ordinary (real) multiplication and \(+\) is ordinary (real) addition. The complex numbers \({\mathbf C}\), with complex multiplication and addition.

These (and the extended reals) are the only fields we will use.

Arithmetic with \(\infty\)¶

We shall use the following conventions:

\(0 \cdot \infty = \infty \cdot 0 = 0\)
\(x + \infty = \infty + x = \infty\), \(x \in {\mathbf R}\)
\(x \cdot \infty = \infty \cdot x = \infty\), \(x > 0\)

With these conventions, \(([-\infty, \infty], \times, +)\) is a field.

Groups of transformations on a set¶

Let \( \mathcal{G} \) be a group (with the group operation implicit) with identity element \(e \in \mathcal{G} \) and let \(\mathcal X\) be a set. Then a group action of \(\mathcal G\) on \(\mathcal X\) is a function from \( {\mathcal G} \times {\mathcal X}\) into \(\mathcal X\) such that (if we denote the image of \((g, x)\) under this transformation as \(g.x\)),

\(e.x = x\) for every \(x \in \mathcal{X}\) (the identity element of the group is an identity transformation on the set)
for every \(g \in \mathcal{G}\) and \(h \in \mathcal{G}\) and every \(x \in \mathcal{X}\), \(g.(h.x) = (gh).x\) (the group operation is compatible with the composition of transformations on the set).

Then we call \(\mathcal{G}\) a group of transformations of \(\mathcal{X}\).

The orbit of the point \( x \in \mathcal{X} \) under \(\mathcal{G}\) is the set \(\mathcal{G}.x \equiv \{ g.x : g \in \mathcal{G} \}\).

If we write \( x \sim y \) when \( y \in \mathcal{G}.x \), then \(\sim\) is an equivalence relation on \(\mathcal{G}\); that is, orbits define equivalence classes. Moreover, the set of distinct orbits partitions a set into equivalence classes.

For example, let \(\mathcal{X} \) be 2-dimensional Euclidean space, and let \(\mathcal{G} \) be rotations around the origin. Such rotations comprise a group, if the group operation is defined as composition: if \(g\) is rotation by the angle \(\theta\) and \(h\) is rotation by the angle \(\omega\), then \(gh\) is defined to be rotation by the angle \(\theta + \omega\), equivalently, rotation by \(\theta\) followed by rotation by \(\omega\). The identity element is rotation by 0. The orbit of the point \( x = (x_1, x_2)\) is a circle centered at the origin, with radius \(r = \sqrt{x_1^2 + x_2^2}\). The entire Euclidean plane can be partitioned into an uncountable union of circles of different radii.

As another example, let \(\mathcal{X}\) be the set of ordered \(n\)-tuples of real numbers, elements of \(\mathbb{R}^n\). Let \(\pi\) denote a permutation of \(\{1, \ldots, n\}\), an \(n\)-tuple of integers between 1 and \(n\) with no duplicates: \(\pi_j\) is a number between 1 and \(n\) and \(\{\pi_1, \pi_2, \ldots, \pi_n \} = \{1, 2, \ldots, n\}\). If \(\pi\) and \(\phi\) are two permutations, define \(\pi \phi\) to be the permutation that results from composing the two permutations. For instance, if \(\pi = (1,3,2,4)\) and \(\phi = (2,4,3,1)\), then \(\pi \phi = (2,3,4,1)\). Such permutations comprise a group with identity element \(e = (1,2,3,4)\). If we define the action of a permutation on an element \( x \in \mathcal{X}\) to be an \(n\)-tuple that results from re-ordering the components of \(x\) according to \(\pi\), then this group For instance, if \(x = (x_1, x_2, \ldots, x_n) \) and \( \pi = (\pi_1, \ldots, \pi_n)\) then \( \pi x = (x_{\pi_1}, \ldots x_{\pi_n})\). Even more concretely, let \(x = (x_1, x_2, x_2, x_4) \) and let \( \pi = (1, 4, 3, 2)\). Then \( \pi x = (x_1, x_4, x_3, x_2)\). The orbit of a point \(x\) is the collection of all elements of \(\mathcal{X}\) with the same components as \(x\), but in all \(n!\) possible orders. (The orbit will contain fewer than \(n!\) distinct elements of \(\mathcal{X}\) if the values of the components of \(x\) are not all distinct.)

Transformations of probability distributions¶

Let \(\mathbb{P}\) be a probability distribution on a (measurable) set \(\mathcal{X}\) and let \(\mathcal{G}\) be a group of transformations of \(\mathcal{X}\) with the property that for every \( A \subset \mathcal{X}\) for which \(\mathbb{P}(A)\) is defined and every \( g \in \mathbb{G} \), \(\mathbb{P}(g.A)\) is also defined (i.e., \(\mathcal{G}\) preserves the measurability of sets).

A probability distribution \(\mathbb{P}\) on a (measurable) set \(\mathcal{X}\) is invariant under the group of transformations \(\mathcal{G}\) if for every \( A \subset \mathcal{X}\) for which \(\mathbb{P}(A)\) is defined and every \( g \in \mathcal{G}\), \(\mathbb{P}(g.A)\) is also defined, and \(\mathbb{P}(A) = \mathbb{P}(g.A)\).

For example, the multivariate normal distribution with IID components is invariant with respect to rotations about its mean.

A collection of random variables is exchangeable if every permutation of the variables has the same joint probability distribution, that is, if the joint distribution is invariant under the permutation group. Any collection of random variables that are independent and identically distributed (IID) is exchangeable, but IID is a stronger property than exchangeability.

Real Linear Vector Spaces¶

Let \(\mathcal{X}\) be a set (the “vectors”) with two relations, \(+\) (vector addition) and \(\cdot\) (scalar multiplication), such that for all \(x, y \in \mathcal{X}\) and all \(\alpha, \beta \in \mathbb{R}\),

\(x+y \in \mathcal{X}\) (\(\mathcal{X}\) is closed under \(+\))
\(\exists 0 \in \mathcal{X}\) such that \(\forall x \in \mathcal{X}\), \(0+x = x\) (\(0\) is the identity element for \(+\))
For each \(x \in \mathcal{X}\), \(\exists -x \in \mathcal{X}\) such that \(-x + x = 0\) (every element has an additive inverses
\(x+y = y+x\) (\(+\) is commutative)
\(x + (y+z) = (x+y) + z\) (\(+\) is associative)
\(\alpha \cdot x \in \mathcal{X}\) (closure under \(\cdot\))
\(1 \cdot x = x\) (\(1\) is the identity element for scalar multiplication)
\((\alpha + \beta)\cdot x = \alpha\cdot x + \beta \cdot x\) (scalar addition distributes over multiplication)
\((\alpha \beta) \cdot x = \alpha\cdot(\beta \cdot x)\) (scalar multiplication is associative)
\(\alpha \cdot (x+y) = \alpha \cdot x + \beta \cdot y\) (scalar multiplication distributes over vector addition)

Then \((\mathcal{X}, +, \cdot)\) is a real linear vector space.

One generally omits the \(\cdot\) notation and just writes \(\alpha x\) rather than \(\alpha \cdot x\).

Linear Independence and Algebraic Bases¶

A set of vectors \(S \subset \mathcal{X}\) of a linear vector space \(\mathcal{X}\) is linearly independent if for every finite subset \(\{s_1, \ldots, s_n \} \subset S\), and every set of scalars \(\{\alpha_1, \ldots, \alpha_n\}\) if \(\sum_{j=1}^n \alpha_js_j = 0\), then \(\alpha_1 = \cdots = \alpha_n = 0\).

An algebraic basis \(B \subset \mathcal{X}\) for a linear vector space \(\mathcal{X}\) is a linearly independent collection of vectors such that for any \(x \in \mathcal{X}\), there exists a finite subset \(\{b_j\}_{j=1}^n\) and a set of scalars \(\{\alpha_j \}_{j=1}^n\) such that \( x = \sum_{j=1}^n \alpha_j b_j\).

Real Inner Product Spaces¶

Let \(\mathcal{X}\) be a real linear vector space and let \(\langle \cdot, \cdot \rangle : \mathcal{X} \times \mathcal{X} \to \mathbb{R}\) such that for all \(x, y, z \in \mathcal{X}\) and all \(\alpha, \beta \in \mathbb{R}\),

\(\langle x, y \rangle = \langle y, x \rangle\) (symmetry)
\(\langle \alpha x, y \rangle = \alpha \langle y, x \rangle\) (scalar homogeneity)
\(\langle x+y, z \rangle = \langle x, z \rangle + \langle y, z \rangle \) (additive)
\(\langle x, x \rangle \ge 0 \) (positive semi-definite)
\(\langle x, x \rangle = 0 \) if and only if \(x = 0\) (positive definite)

Then \((\mathcal{X}, \langle, \rangle)\) is a real inner product space.

Norms¶

Let \(\mathcal{X}\) be a real linear vector space and let \(\| \cdot \|: \mathcal{X} \to \mathbb{R}^+\) such that for all \(x, y, x \in \mathcal{X}\) and all \(\alpha \in \mathbb{R}\),

\(\|x \| = 0\) if and only if \(x = 0\) (positive definite)
\(\| \alpha x \| = |\alpha| \|x\|\) (positive scalar homogeneity)
\(\| x + y \| \le \| x \| + \| y \|\) (triangle inequality)

Then \(\| \cdot \|\) is a norm on \(\mathcal{X}\), and we call \(\mathcal{X}\) a normed vector space.

Cauchy sequences¶

Suppose \(\mathcal{X}\) is a normed vector space with norm \(\| \cdot \|\), and let \(x_1, x_2, \ldots \) be a sequence of elements of \(\mathcal{X}\).

\[\begin{equation*} \| x_{j+1} - x_j \| \rightarrow 0 \end{equation*}\]

as \(j \rightarrow \infty\), then \(x_1, x_2, \ldots \) is a Cauchy sequence.

Completeness¶

A normed vector space is complete if every Cauchy sequence in \(\mathcal{X}\) has a limit in \(\mathcal{X}\). That is, if \(x_1, x_2, \ldots \) is a Cauchy sequence, then there exists \(x \in \mathcal{X}\) such that

\[\begin{equation*} \lim_{j \rightarrow \infty} \| x_j - x \| = 0. \end{equation*}\]

Balls¶

In a normed vector space \(\mathcal{X}\), the closed ball of radius \(\epsilon > 0\) centered at \(x \in \mathcal{X}\) is

\[\begin{equation*} B_\epsilon [x_0] \equiv \{ x \in \mathcal{X}: \|x - x_0\| \le \epsilon \}.\end{equation*}\]

The the open ball of radius \(\epsilon > 0\) centered at \(x \in \mathcal{X}\) is

\[\begin{equation*} B_\epsilon(x_0) \equiv \{ x \in \mathcal{X}: \|x - x_0\| < \epsilon \}.\end{equation*}\]

The (open) unit ball in \(\mathcal{X}\) is \(B_1[0]\) (\(B_1(0)\)).

Hilbert Space¶

A Hilbert space is an inner product space \((\mathcal{X}, \langle , \rangle)\), endowed with the norm \(\|x \| \equiv \sqrt{\langle x, x \rangle}\), that is complete with respect to that norm.

The Riesz Representation Theorem¶

In a Hilbert space, every bounded linear functional can be represented as the inner product with some element of the space.

That is, suppose \(L\) is a bounded linear functional on a Hilbert space \(\mathcal{H}\): \(L(\alpha x + \beta y) = \alpha L(x) + \beta L(y)\) for all \(x, y \in \mathcal{H}\) and all scalars \(\alpha\) and \(\beta\), and there exists a real number \(C \ge 0\) such that for all \(x \in \mathcal{H}\), \(|L(x)| \le C \|x\|\).

Then there exists \(\ell \in \mathcal{H}\) such that for all \(x \in \mathcal{H}\), \(L(x) = \langle \ell, x \rangle\).

Collaborative and Reproducible Data Science

Mathematical Preliminaries

Contents