Probability, Random Variable, and Distribution

Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.


Main References

  • Kim, Dukpa. (2024). "Econometric Analysis" (2024 Spring) ECON 518, Department of Economics, Korea University.
  • Capinski and Kopp. (2003). "Measure, Integral and Probability" (2nd edition)
  • Hogg et al. (2013). "Introduction to Mathematical Statistics" (8th Edition)

Here, we briefly introduce more rigorous definitions of the probability theory based on the measure theory. The main goal of this note is to understand the concept of probability measure intuitively, and get familiar with the jargon of measure theory. The detailed theorems and results of the measure theory used in this section will be further analyzed in 📑Note for Probability Theory, with mathematical proofs.

Probability

Probability Space

The goal of this note is to fully understand the concepts of the key elements consisting the probability space.

Definition (probability space).

A probability space is the triple consists of three elements:

  1. A sample space, , is the set of all possible outcomes of a random experiment. An element of is , which is called an outcome.
  2. An event space, , is a collection of all subsets of , called a -field. An element of is , which is called an event.
  3. A probability function (measure), , assigns each event to a probability, which is a number between and .

In later chapter Probability Measure, we will discuss about how to understand field and measure. Before that, we first review some of the key definitions we studied in Mathematical Statistics.

Definition (probability).

A set function on is a probability or probability measure if it satisfies:

  1. , for all .
  2. .
  3. If are disjoint, then we have
Example (tossing a coin).

Consider an experiment of tossing a coin. The two possible outcomes are head() and tail. Therefore, the sample space is and the event space is By definition, we have , implying that . If the coin is a fair one, then we would also have .

Definition (conditional probability).

For an event such that , the conditional probability of given is defined by which function is also a probability (measure).

Random Variables

Let a probability space be given for the rest of this section.

Definition (random variable).

A random variable is a measurable function from to . i.e, it assigns a real number to each outcome.

Remark (measureable function).

A function is measurable if where is called the Borel sets.

Here, you can simply understand the Borel sets as a collection of subsets of the real line. A detailed explanations follows in Borel Measure. Now we define a probability (measure) on ,

Example (tossing a coin 2).

In the given sample space we may define a random variable by and .

A random variable is simply a mapping that maps each outcome to a real number where we can use well developed mathematical tools.

We now re-define some of the familiar concepts.

Distribution and Density Function

Definition (distribution).

A distribution of a random variable is the probability (measure) on induced by :

Definition (distribution function).

A (cumulative) distribution function of a random variable is defined as If is a continuous random variable, then If is a discrete random variable, then

Proposition (properties of distribution function).
  1. non-decreasing:
  2. limit value:
  3. continuity when increasing:
  4. expectation:
Definition (probability mass function).

A probability mass function (pmf) is the probability distribution of a discrete random variable:

The density function of a continuous variable is more tricky to define.

Definition (absolutely continuous and density).

If a measure satisfies for every integrable function , then is absolutely continuous. Here such is called as a density of for a measure .

Note that the absolute continuity of is a result from a property of the measure, which itself is continuous. While we will further look into the definition later on, here, you can simply understand as a characteristic of the integral.

Definition (probability density function).

A probability density function (pdf) of a continuous random variable is the function that satisfies


Probability Measure

Sigma-Algebra

First, we follow the previous notions of the sample space and the set of all events denoted and , respectively. Obviously, is a collection of subsets of , a set of sets (i.e. family of sets). An typical example of an event set is the power set , which collects all the subsets of .

A sigma-algebra is a subset of the poser set , which is therefore a family of sets in , that satisfies certain properties.

Definition (sigma-algebra and measurable space).

For a sample space , the set of all events is -algebra or -field on , if the following conditions are satisfied:

  1. : inclusion of empty set and the entire set.
  2. , : closed under complements.
  3. , : closed under countable unions.

For , which is a -algebra on , an ordered pair of is called a measurable space.

Here, note that the term 'sigma-' in mathematics usually implies 'countable union' or 'countably infinite'. Thus, -algebra means it is closed under the countable union, i.e., it contains every countable union of itself.

Furthermore, since countable set is isomorphic to , meaning that there exists an one-to-one correspondence between the two, we can understand -algebra as the set consists the countably infinite number of the subsets of , where we can index the each of the subsets in natural number bases.

Example (example of sigma-algebra and induced algebra).

Let . Then the -algebra of can be and many others. Here, is a -algebra of induced by the set , meaning that is a -algebra containing .

Measure

For some -algebra which is defined on , a measure is a function that assigns each events to a non-negative real numbers: Intuitively, this set function is a generalization and formalization of geometrical measures (length, area, and volume) and other common notions (magnitude, mass, and probability). More simply, we can understand the concept of a measure as a set function that measures the size of the given set.

Definition (measure and measure space).

Let be a sample space and be a -algebra over . A set function is called a measure if the following conditions hold:

  1. , : non-negativity.
  2. : empty set with measure .
  3. For any disjoint countable collection , of , we have : countable additivity.

And a triple is called a measure space.

Here, the countable additivity is the key property that makes a measure a generalized length.

Proposition (properties of measure).

Let be a measure. The following properties directly follows from the definition.

  1. monotonicity: If are measurable sets with , then
  2. continuity from below: If are measurable sets that are increasing, then the union of the sets is measurable and
  3. continuity from above: If are measurable sets that are decreasing, then the intersection of the sets is measurable. Furthermore, if at least one of the has finite measure then

From the definition of measure, by adding one more property, we finally have the definition of the probability measure.

Definition (probability measure and probability space).

Let be a sample space and be a -algebra over . A set function is called a probability measure if the following conditions hold:

  1. , : non-negativity.
  2. : empty set with measure .
  3. For any disjoint countable collection , of , we have : countable additivity.
  4. : total mass is 1.

And the triple is called as a probability space.

Measurable Sets

Definition (measurable set).

Let be a measurable space. Then a subset is said to be (-)measurable if .

Remark (null set).

A measurable set is null set if .

Definition (length and null set).

Let a set of open intervals in real line, and define a function such that and denote the function as length. Then, a set is called null set if there exists a sequence of open interval such that for every .

Compared to the empty set, null set refers to the set which is practically non-existing, while empty set denotes the set which is actually empty. Here, the null means meaningless, insignificance, or negligible, rather then the absence.

Proposition (theorems of null set).
  1. empty set is null set.
  2. any set of a single element is null set.
  3. any countable set is null set.
Definition (almost surely).

Let be a probability space. An event happens almost surely if . Equivalently, happens almost surely if the probability of not occurring is zero: .

Intuitively, 'almost surely' refers to the every point except the null set. This concept is similar to the integral where , and also to the probability where we does not care whether the end points of the set is included or not.

Definition (almost surely converges).

The sequence of random vectors converges almost surely, i.e. if

Note that in Definition 24 (almost surely converges), we do not care about the timing when enters to the neighborhood of and remains there forever. Thus we introduce further concepts.

Definition (event eventually and infinitely often).

Let be a probability space, and let be a sequence of events in . Then events eventually (e.v.) denotes the chance of happening in infinite horizon of time: Using De Morgan's law, the opposite case is infinitely often (i.o.), the chance that will not violate the given in infinite horizon of time:

Here, the term e.v. and i.o. emphasizes that in terms of convergence in a sequence, the only important thing is the long-term behavior, not the behavior in the first finite horizon of the time.

Remark (almost sure convergence, and ev and io).

Define a sequence of sets as . Then, , if the probability that events eventually is equals to : which condition is equivalent to i.e. the probability that happens infinitely often is equals to .

Usually, Remark 26 (almost sure convergence, and ev and io) is another common way to define Definition 24 (almost surely converges). For the brief understanding, suppose pointwise and let . By the definition, for all .
Thus we have Alternatively, resulting in the definition of almost sure convergence. For the infinitely often part, it can be easily driven using De Morgan's law.


Random Variable

Measurable Functions

Intuitively, a measurable function is a function between the two measurable space that preserves the structure of the spaces. Here, the inverse image of any measurable function is measurable.

Definition (measurable function).

Let be a measurable space, and for a function , define a set Then, is measurable if for all , .

It is shown that it is equivalent to show the following conditions to prove the given function is measurable.

Proposition (equivalent conditions to measurable function).

For a function , the following conditions are equivalent:

  1. ,
  2. ,
  3. ,
  4. ,

Proof.As 1 and 2 are complement to each other, similarly, 3 and 4 are complement to each other. By Definition 14 (sigma-algebra and measurable space), 1 and 2, and 3 and 4 are equivalent conditions (since the complement set of the element in algebra is also its element). Thus it is sufficient to show that 1 and 3 are equivalent condition, since 1 is a given definition from Definition 27 (measurable function). Below, we first show that 1 implies 3, and then we show the converse.

() Assume for a function , we have Then we have since , and 1 holds for every . Since and field is closed under every intersection, we have .

() Assume for a function , we have Then we have and as field is closed under every union.

Remark that Definition 27 (measurable function) is analogue to Continuous Function > Definition 5 (continuous function between topological spaces), which gives the sense of the preserving properties of the measurable function.

For your information, measurable function can be defined in more fundamental way, not directly defining from its inverse-image. In that case, we can also derive the same property of Definition 27 (measurable function).

Furthermore, measurable function can alternatively obtained from the pointwise convergence of the sequence of simple functions.

Theorem 48 (approximation by simple functions).

Let be a measurable space, and a non-negative -measurable function. Then, there exists an increasing sequence of -measurable simple functions such that pointwise as .

Borel Sets

In real space, Borel algebra is the intersection of all sigma-algebra that includes open sets in . In other words, it is the smallest sigma-algebra that can be defined on while including all the open sets (intervals) of . It is useful in the sense that it only contains the necessary elements to define a measure on the set of open sets, which makes it available to define a probability measure.

Definition (Borel sets on Euclidean space).

Let be a sigma-algebra of Euclidean space . Then the sigma-algebra generated by the set of all intervals is said to be Borel sigma-algebra of and is called Borel set.

While we defined Borel algebra under Euclidean space, it is more general to start on an arbitrary set. However, in probability theory, it is sufficient enough to define on , as it consists of all open and closed sets that can be defined on .

Example (example of Borel set).

Based on the closure properties of the field, most of the familiar sets in belongs to :

  • By construction, all intervals in belongs to .
  • Since is a field, and as the all open sets are the countable union of intervals, every open sets in belongs to .
  • Since each countable sets are a countable union of closed intervals in , every countable sets belongs to . In particular, and are Borel sets.
  • As a field, includes the complement of a Borel sets, which means, the set of irrational numbers and the finite sets are also Borel sets.

Random Variable

Definition (random variable).

Let be a probability space. A function is a random variable if Or, equivalently, we can define as where denotes a Borel algebra.

Note that the equivalence in Definition 31 (random variable) holds by the properties of Definition 29 (Borel sets on Euclidean space). Random variable is a mapping that maps each elements in the sample space into a real space, which makes it available to use inequalities. Also, it restricts the sigma-field to the inverse image of the Borel sets, restraining the excessive abstractness of the sample space.

Remark that by the definition, is a ()measurable function. Moreover, if , then becomes a Borel function, since we have .

For your information, to generalize the definition into the multivariate random variable, we can simply define such that

Definition (sigma-fields generated by random variable).

Let be a probability space and let be a random variable. Then a field defined as is said to be a field generated by .

Probability Distribution

Probability Distribution

Definition (probability distribution).

Let be a probability space, and let be random variable. Then a measure is said to be probability distribution of if

Proposition (countably additive of probability distribution).

The set function is countably additive.

Proof.Let are the pairwise disjoint Borel sets, then their inverse image are also pairwise disjoint. We have Thus by Definition 33 (probability distribution), we have where the third equality holds by Definition 16 (measure and measure space), as denotes a probability measure.

Thus, is also a probability space. Note that .

Using Definition 33 (probability distribution), we can re-define the independence of the random variable using measure theory.

Definition (independence of random variable).

Let be a probability space, and let be random variables. We say and are independence if we have

Note that the independence between and is equivalent to the followings.

Theorem (equivalent of independence).

The following conditions are equivalent.

  • The random variable and are independence. i.e.
  • For all Borel function , we have
  • For the joint distribution , we have

Dirac Measure and Discrete Distribution

Definition (Dirac measure).

Let be a probability space. Assume a random variable is a constant function: for all . Then its probability distribution is said to be Dirac measure :

Note that the probability distribution only takes account whether the given is included in the Borel set or not. This approach is fundamentally different than the one based on the difference between the discrete and continuous distribution.

Example (multiple value discrete variable).

Consider the case when Then using Definition 37 (Dirac measure), we can express its probability distribution as which means

Thus we can now define a general form of discrete probability distribution.

Definition (discrete probability distribution).

Let for all , and . Then we can express the probability distribution of a discrete random variable as

Classical examples are:

  • Geometric distribution: for some .
  • Poisson distribution: .

Note that we can also define a random variable that is neither discrete nor continuous.

Example (distance between Car and B).

Suppose a car leaves city A at random between 1 pm and 2 pm. It travels at 100 km/h towards B which is 50 km from A. What is the probability distribution of the distance between the car and B at 2 pm?

Proof.If the car starts traveling before 1.30 pm, then it can arrive at B before 2 pm. However, if it starts after 1.30, then its distance from the B would follow a uniform distribution. Thus we have where denotes the minute when the car starts after 1 (i.e. if , then it starts at 1.30 pm).
Then, its probability distribution can be expressed as where denotes a uniform measure.