Theories of Probability

We’ve been looking at the mathematical side of probability, which starts from some axioms that probability is assumed to satisfy, and draws conclusions (i.e., theorems). But what is probability? What does probability mean? What connects the mathematics to the real world?

This is a knotty philosophical problem that scholars still disagree about. We will look at several attempts to make sense of statements such as “the probability that a fair coin lands heads when you toss it is 1/2.”

To tie this to scientific questions, we will also think about how to interpret statements such as “there is a 70% chance of a magnitude 7.5 or greater earthquake in the San Francisco Bay Area in the next 25 years,” “there is a 90% chance global temperature will increase 3° in the next 50 years,” “there is a 99.99% chance that the Higgs boson exists (or that it was detected),” and similar statements.

Equally Likely Outcomes

This theory of probability is the oldest. It originated in the study of games of chance, such as dice games and card games. In the Theory of Equally Likely Outcomes, probability assignments depend on the assertion that no particular outcome is preferred over any other by Nature; generally, arguments that Nature should show no preference among the outcomes appeal to the symmetry of the system studied (such as a “fair” coin or die).

If a given experiment or trial has \(n\) possible outcomes among which—it is assumed—Nature should show no preference, this theory defines them to be equally likely. The probability of each outcome is then \(100\%/n\).

Applying this theory of probability involves arguing that no particular outcome should occur in preference to another, typically by appeal to physical symmetries or other considerations—e.g., that there is no reason one marble would be predisposed to turn up rather than another in drawing a marble from a well-stirred bowl of marbles. Laplace’s Principle of Insufficient Reason asserts that if there is no reason to think that a set of outcomes is not equally likely, they should be taken to be equally likely. This is poor logic: no evidence of a difference is not the same as evidence of no difference. The fallacy is called Appeal to Ignorance. See also Stark and Freedman, 2003.

For example, if a coin is balanced well, there is no reason for it to land heads in preference to tails when it is tossed vigorously, so according to the Theory of Equally Likely Outcomes, the probability that the coin lands heads is equal to the probability that the coin lands tails, and both are 100%/2 = 50%. (This ignores the nearly impossible outcomes that the coin does not land at all or lands balanced on its edge.) Similarly, if a die is fair (properly balanced) the chance that when it is rolled vigorously it lands with the side with one spot on top (the chance that the die shows one spot) is the same as the chance that it shows two spots or three spots or four spots or five spots or six spots: 100%/6, about 16.7%.

If an event consists of more than one possible outcome, the chance of the event is the number of ways it can occur, divided by the total number of things that could occur. For example, the chance that a die lands showing an even number of spots is the number of ways it could land showing an even number of spots (3, namely, landing showing 2, 4, or 6 spots), divided by the total number of things that could occur (6, namely, landing showing 1, 2, 3, 4, 5, or 6 spots). Since the total number of possible outcomes is \(n\), the maximum possible probability of any event is \(100% \times n/n = 100\%\). Thus, in the Theory of Equally Likely Outcomes, probabilities are between 0% and 100%, as claimed.

Frequency Theory

In the Frequency Theory of Probability, probability is the limit of the relative frequency with which an event occurs in repeated trials. (The trials must be independent.)

Relative frequencies are always between 0% (the event essentially never happens) and 100% (the event essentially always happens), so in this theory as well, probabilities are between 0% and 100%. According to the Frequency Theory of Probability, what it means to say that “the probability that \(A\) occurs is \(p\)%” is that if you repeat the experiment over and over again, independently and under essentially identical conditions, the percentage of the time that \(A\) occurs will converge to \(p\). For example, under the Frequency Theory, to say that the chance that a coin lands heads is 50% means that if you toss the coin over and over again, independently, the ratio of the number of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Because the ratio of heads to tosses is always between 0% and 100%, when the probability exists it must be between 0% and 100%.

The Subjective Theory

In the Subjective Theory of Probability, probability measures the speaker’s “degree of belief” that the event will occur, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, what it means for me to say that “the probability that \(A\) occurs is 2/3” is that I believe that \(A\) will happen twice as strongly as I believe that \(A\) will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of events that in principle can occur only once. For example, how might one assign meaning to a statement like “there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050?” (See Stark and Freedman, 2003, for more discussion of theories of probability and their application to earthquakes.) It is very hard to use either the Theory of Equally Likely Outcomes or the Frequency Theory to make sense of the assertion. Can you think of other examples?

These three theories of probability assign different meanings to the statement “the chance that \(A\) occurs is \(p%\).” Each theory has situations in which it is most natural, and each theory has shortcomings. We will use the Frequency Theory primarily.

Model-Based Probability

[To do]

Shortcomings of the Theories

While each of the theories has attractive elements, all have shortcomings as well. The shortcomings involve hidden assumptions, limited domains of applicability, and changes of subject. (The Subjective Theory changes the subject more than the others do, as elaborated below. Arguing for a conclusion superficially related to the desired conclusion and pretending you have established the latter is The Red Herring fallacy. In contrast, arguing against a hypothesis superficially like one you wish to refute then claiming you have refuted the latter is The Straw Man fallacy.

The Theory of Equally Likely Outcomes

Even when there are only a few possible outcomes, it is not always clear whether they should be deemed equally likely. For example, consider tossing two coins at the same time. The possible outcomes could be {two heads, not two heads} or {two heads, one head and one tail, two tails} or {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}.

The last of these assigns the same probabilities the Frequency Theory does. For instance, if the equally likely outcomes are taken to be {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}, both the Theory of Equally Likely Outcomes and the Frequency Theory would say that the chance of two heads is 25%. If one is using probability to bet on games of chance, long-term relative frequencies—which the Frequency Theory contemplates—are perhaps the most important consideration, because they determine how much one wins or loses in the long run. It seems rather artificial to introduce a distinction between two otherwise identical coins in order to make the probabilities calculated using the Theory of Equally Likely Outcomes agree with the probabilities calculated using the Frequency Theory. Perhaps a more serious limitation of the Theory of Equally Likely Outcomes is that many situations do not have natural symmetries to exploit to decide which outcomes are equally likely. For example, what is the chance that a thumbtack lands with its point up when it is tossed vigorously? What is the chance that a die that has been “loaded” (modified to be unbalanced) lands showing one spot? Neither of these problems has a natural symmetry from which to argue that the outcomes are equally likely. Moreover, in many situations there are an infinite number of possible outcomes; dividing 100% by infinity yields zero.

The Frequency Theory

The Frequency Theory requires an assumption about how the world works: The relative frequency with which an event occurs in repeated trials is assumed to converge to a limit. What is a limit? In the case of coin tossing, the theory says that for any positive number \(\epsilon\), no matter how small, there is some number \(M\), which can depend on \(\epsilon\), such that

\[\begin{equation*} | \mbox{(#heads in n tosses)}/n - 50\% | < \epsilon \end{equation*}\]

whenever the number of tosses \(n > M\). Not all sequences of heads and tails satisfy this assumption. For example, suppose the first toss gives a head. The relative frequency of heads is then 100%. Suppose the next 3 tosses give tails. The relative frequency of heads is then 25%. Suppose the next 100 tosses give heads. The relative frequency of heads is then over 97%. Suppose the next 5000 tosses give tails. The relative frequency of heads is then about 20%. If we continue in this way, with ever longer runs of heads and of tails, the relative frequency of heads never approaches a limit.

The Empirical Law of Averages says this never happens: The world works in such a way that the relative frequency with which a random event occurs in repeated trials always settles down to a limit. This “law” is an assumption about how the world works. It is not a mathematical fact, and it is not an observation because no one can continue tossing coins forever to see whether the relative frequency of heads starts to vary again after, say, 100,000,000,000,000 tosses. The Empirical Law of Averages is essential to the Frequency Theory.

The second limitation of the Frequency Theory is that many events to which we might like to assign probabilities are not the outcomes of repeatable experiments. For example, what is the probability that the universe will end in a “big crunch?” What is the probability that my 2020 tax return will be audited? What is the probability that in 2020 more online textbooks than paper textbooks will be sold? What is the probability that the Dow Jones Industrial Average reaches 20,000 before the year 2020?

The Subjective Theory

The principal shortcoming of the Subjective Theory is that colloquially we think of probability as being a property of an event in the external (objective) world, not merely a reflection of our state of mind. When I say “this thumbtack has probability 66% of landing point up when I toss it,” you probably think I am talking about the tack, not about my state of mind with respect to the tack: this theory changes the subject. Similarly, under the Subjective Theory, you and I can disagree about the probability of an event and both be correct, which seems unsatisfactory in many scientific settings.

There are a variety of technical difficulties in the Subjective Theory regarding how to measure the probability of an event. One possible resolution is to study the bets you are willing to take. Would you be indifferent between a bet that a coin lands heads and a bet with the same stakes that it lands tails? If so, some theorists would conclude that your subjective probability that the coin lands heads is 50%. Some factors can complicate this approach. For example, even though you know that buying a lottery ticket is almost certainly throwing your money away, you might buy a ticket anyway, reasoning that you would not particularly miss the $1 cost of the ticket, while you would definitely notice winning $20,000,000. In this scenario, the probability of winning is less of an issue than the possibility of winning.

Here is another example: I will bet you $1,000,000 against $500 that there will not be a nuclear bomb dropped on Berkeley, California, by the year 2020. Even if I am confident that a nuclear bomb will be dropped, if it is dropped, I won’t have to pay off the lost bet (I live in Berkeley), but if it is not dropped, I could use the $500 you would owe me.

Another problem with the Subjective Theory has to do with scientific method. Some philosophers of science maintain that unless an hypothesis can, in principle, be shown to be false, it is not scientific. An hypothesis that in principle can be disproved is called falsifiable. In the Frequency Theory, one can collect evidence against the statement that “the probability that \(A\) occurs is \(p\)%” by repeating an experiment over and over and looking at the fraction of times the event \(A\) occurs. In the Subjective Theory, evidence against the hypothesis that “the probability that \(A\) occurs is \(p\)%” is found by psychological testing to see whether the individual making the statement is telling the truth and is internally consistent in his assignments of probability. Running the real-world experiment over would not be relevant.

Model-based Probability

[To Do] Calibrating/testing the model, availability of relevant data, …

Previous: Sets, Combinatorics, & Probability Next: Random Variables, Expectation, & Random Vectors