Basic Definitions in Probability

Random Variables

Probability Space

Definition 16 (measure).

A set function is called a measure if for any countable collection of disjoint measurable sets , Additionally, we may add another condition for the lower bound: Furthermore, if , then it is a probability measure, denoted by .

Definition 17 (measure space).

Let be a given set, be a -field on , and be a measure. The triple is called measure space. If is a probability measure, then is called probability space.

Random Variables

Definition 30 (measurable function).

Let be a measurable space, and be a topological space. Then is -measurable if In particular, if , then is Borel measurable.

Definition (random variable).

Let be a -measurable function.

if , then is a random variable.
if , then is a random vector.

Theorem (measurability of random vector).

If are -measurable and is Borel-measurable, then is -measurable.

Proof.Let , then is -measurable if is -measurable by 2) of Measure Theoretic Preliminaries > Theorem 37 (measurability under various operations).

Remark that from Measure Theoretic Preliminaries > Theorem 10 (Borel sigma-field on euclidean space), we have is -measurable if for all , or . Now let be the Borel sets in . Then we havesince is closed under countable intersection.

Therefore is -measurable, thus is -measurable. □

Distribution Function

Definition (distribution).

Let be a probability space, and be an -measurable function. Then for any , define Then is called a distribution of .

Proposition (distribution is measure).

Let be a distribution of . Then is a probability measure on .

Proof.Trivially, we have Now, consider a sequence of disjoint sets , then we have Therefore, is a probability measure. □

Definition (identical distribution).

Let and be both measurable functions. We say and are equal in distribution and denote as if their distributions are identical, i.e.,

Definition (distribution function).

Let be a measurable function, and let for be the distribution of . Then the function is called a distribution function.

Theorem (properties of distribution function).

Let be any distribution function. Then, we have the following properties:

is non-decreasing.
, and .
is right-continuous, i.e., .
If , then .

Proof.

Let , then by ^00da88 Definition 6 (distribution function),Since , by the monotonicity of the measure in Measure Theoretic Preliminaries > Theorem 18 (properties of measure), we have
Note that we have Let be a sequence such that . Then, we have .
Thus, Similarly, by letting , we have . Then,
Let , then . Then,
Let , then , thus
Notice that , then we have Then, by (3) and (4), we haveThis completes the proof.□

Remark (sufficient condition for continuous distribution function).

Note that from ^24cd70 Theorem 7 (properties of distribution function), we can conclude that is continuous if and only if .

Theorem (sufficient condition for distribution function).

If a function satisfies the following conditions,

is non-decreasing
,
is right continuous

then is a distribution of some random variable.

Proof.Here, we need to construct a probability space , and a random variable such that Without loss of generality, let , , and be a Lebesgue measure. Then, since , is a probability space.

Now defineWe claim: If holds, thenwhich completes the proof.

Proof of claim:
() Let , then . Thus, is an upper bound of . As we have defined , we therefore have .

() We show the inverse by: If , then .
First, let , then since is right continuous, This gives us , as is a supremum. □

Proposition (distribution function is almost surely continuous).

Let be a distribution function. is discontinuous at most countably many points.

Proof.Note that ^3c1cad Remark 8 (sufficient condition for continuous distribution function), if is discontinuous at then . Thus there exists some such that for any discontinuous points . Then, if there are more than discontinuous points, we have , which contradicts ^24cd70 Theorem 7 (properties of distribution function). Thus has at most discontinuous points.□

Remark (distribution function always exists).

For a given random variable, its distribution function (or cumulative distribution function) always exists by Measure Theoretic Preliminaries > Theorem 18 (properties of measure) and ^044db3 Theorem 9 (sufficient condition for distribution function).

Density Function

Definition (probability density function).

Let be a distribution function of . Suppose which is Borel-measurable on , such thatthen such is called a probability density function of .

Example (uniform distribution).

Uniform distribution on :The distribution is

Example (exponential distribution).

Exponential distribution:The distribution is

Remark (memoryless).

Note that a random variable is said to have a memoryless property if has memoryless property if and only if has an exponential distribution.

Example (standard normal distribution).

Standard normal distribution: Normal distribution with mean and variance ,

Expected Values

See Measure Theoretic Preliminaries > Abstract Integration for the detailed definition of integrals.

Definition (expected value).

Let be a random variable on . Then the expected value of is defined as where is the set of non-negative simple functions simple functions.
If is any random variable, for in Measure Theoretic Preliminaries > Definition 42 (positive and negative parts of function), its expected value is provided that or .

Remark (absolute value and integrability).

Let be a random variable and let the absolute value of random variable be . Then

This comes directly from Measure Theoretic Preliminaries > Definition 65 (integral of numerical function) and Measure Theoretic Preliminaries > Proposition 66 (integrability and absolute function).

Theorem (properties of expectation).

Let be random variables on , and suppose or . Then we have

.
for any .
If , then .

Proof.(1.1) First, assume . Then we can directly apply Measure Theoretic Preliminaries > Corollary 58 (linearity of integration).

Let be the non-negative measurable simple function simple function such that and respectively. Then, by the properties of limit and Measure Theoretic Preliminaries > Lemma 47 (operations on simple function), is also a non-negative measurable simple function such that .
Then by the linearity in Measure Theoretic Preliminaries > Lemma 52 (properties of integration on simple function) and Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), we have (1.2) Now suppose . We apply Measure Theoretic Preliminaries > Theorem 67 (properties of integrability).

By Basic Definitions in Probability > Remark 18 (absolute value and integrability), we have . Now put by rearranging the term, we have as each terms is non-negative, using the result in (1.1), we have thus

(2.1) Assume and . Then applying Measure Theoretic Preliminaries > Theorem 55 (properties of integral 2), let be a non-negative simple function that . Then by Measure Theoretic Preliminaries > Lemma 47 (operations on simple function), is also a non-negative simple function that converges to . Then, using Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), we have For , by letting where by Measure Theoretic Preliminaries > Definition 16 (measure). Finally, by applying (1.1), we have .

If or , then use the proof similar to (2.2).

(2.2) Assume , then we have .

If , then and . Thus If , then and . Thus If , then and . Otherwise, if , then and . Thus in either case, we have Then using the result in (1.2), we have .

(3.1) Assume and . We use the monotonicity in Measure Theoretic Preliminaries > Theorem 54 (properties of integral 1). Let be any simple function such that , then . Thus we have meaning that is an upper bound of the set . Thus by ^0a756d Definition 17 (expected value), we have (3.2) Assume where . Then where all terms are non-negative. Thus applying (3.1) and (1.1), we have which gives us . □

Definition (variance).

Let be a random element defined on . Then the variance of is defined as

Remark (computing variance).

If , then the variance of can be computed as

Proof.From ^044ca1 Definition 20 (variance), we have where the linearity of expectation of ^1dbd04 Theorem 19 (properties of expectation) can be applied since . □

Corollary (linear transformation in variance).

For any , we have .

Proof.As we already have by ^1dbd04 Theorem 19 (properties of expectation), we have This completes the proof. □

Inequalities

Remark (facts on convex function).

Let be a convex function. Then the following results are given as facts:

is continuous
is almost everywhere differentiable
, such that .

Proposition (Jensen's inequality).

Suppose is convex, i.e. If , then

Proof.As for some , we have which completes the proof. □

Remark (applications of Jensen's inequality).

By letting , we have and similarly, for for , we have or

Proposition (Chebyshev's inequality).

Suppose such that . For and , we have

Proof.From the definition of , we have and by taking expectation, by the third of ^1dbd04 Theorem 19 (properties of expectation), we have the desired result. □

Remark (common form of Chebyshev).

By letting in ^41e8d8 Proposition 26 (Chebyshev's inequality), for , we have

Proof. Let By integration the both sides, since , meaning , we have which completes the proof. □

Change of Variable Formula

Theorem (change of the variable formula).

Let be a random element and be a measurable function. Denote as a distribution of on , i.e. . If or (), then

Proof.(case 1) First we consider the case of an indicator function, where Then recalling ^c3cdaa Definition 3 (distribution), we have

(case 2) Now we consider a simple function, where Then, by the case 1 and the linearity of ^1dbd04 Theorem 19 (properties of expectation),

(case 3) Next, let be a non-negative measurable function. Now take be measurable simple functions that . Then, by the case 2, we have and by Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), we have (case 4) Lastly, consider an integrable function , such that . Then, by ^4058d5 Remark 18 (absolute value and integrability), for , we have . Then by the case 3 and the linearity of ^1dbd04 Theorem 19 (properties of expectation), we have This completes the proof. □

Corollary (expectation of composition).

Let and be measurable functions, and let the probability measure of and its density function as , meaning where is a lebesgue measure. If or , then we have

Proof.Without loss of generality, let for some . Then by ^f8829f Theorem 28 (change of the variable formula) and ^73d047 Definition 12 (probability density function), we have

Then, for a simple function , from the case of simple function and the linearity of ^1dbd04 Theorem 19 (properties of expectation), we have

Next, for a non-negative measurable function , consider measurable simple functions such that . Then, by Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), Lastly, for an integerable function where , we have by Measure Theoretic Preliminaries > Theorem 67 (properties of integrability). □

Computing Expected Values

Remark (computing expected value).

By ^5bcb70 Corollary 29 (expectation of composition), the expected value of is computed by and the kth moment of is computed as Moreover, the variance can also be computed as

Example (moments of exponential distribution).

Let have exponential distribution exponential distribution where its pdf pdf is Then, the kth moment of is

Proof.First assume when . Using ^e01d29 Remark 30 (computing expected value), we have Thus , and .

Now consider a general case when . Then, we have Thus , and . □

Example (moments of standard normal distribution).

Let have standard normal distribution standard normal distribution where its pdf pdf is Then it has Furthermore, if , then we have and .

Proof.We have and The case of can be driven straightforwardly from ^1dbd04 Theorem 19 (properties of expectation) and ^7ea5f6 Remark 21 (computing variance). □

Example (moments of poisson distribution).

Let have a poisson distribution with parameter where Then we have

Proof.Remark that Now we first obtain the first moment: Next, consider following: Then, we get the variance where the second moment is . □

Independence

Definition (independence between pairwise elements).

Let be a probability space.

Two events are independent if
Two random variables are independent if for any ,
Two fields are independent if , the events and are independent.

Exercise (independency of random variables and sigma-fields).

Let be a probability space. Assume be random variables and be fields on .

are independent if and only if are independent, where are Measure Theoretic Preliminaries > Definition 34 (function-generated sigma-field).
If are independent and , then are independent.

Proof.(1) This comes directly from definition. (2) Let be independent. Then Now denote be any borel sets. Then by Measure Theoretic Preliminaries > Definition 30 (measurable function), we have and . Thus This completes the proof. □

Definition (independence between finite elements).

Let be a probability space.

fields are independent if
Random variables are independent if the generated fields are independent, i.e.
Events are independent if

Example (pairwise and all independence).

Let are random variables with Then for , they are pairwise independent while not independent for all three.

Proof.We first show that , : and as for all . We thus have , meaning that are pairwise independent.

However, are not independent since This completes the proof. □

Sufficient Condition for Independence

Definition (independent collection of sets).

Let be a probability space. Then the collection of sets are independent if whenever , and , we have

Remark (implication of independency).

Suppose , . Then
If are independent, then so are .

Proof.(1) Suppose for all and . As we can let for , we have . This method can also be applied to the reverse.

(2) This can be shown using the same logic of (1). □

Recall the

theorem:

Definition 11 (pi and lambda system).

Let be an arbitrary set and be a collection of subsets of .

is called a system if it is closed under intersection:
is called a system if the followings hold:
1. .
2. for any such that , we have .
3. for any sequence such that for all , we have .

Theorem 15 (Dynkin's pi-lambda theorem).

Let be a system on and be a system on that contains . Then .

Theorem (independence of generated sigma-field).

Suppose are independent and each is a system. Then are independent.

Proof.Our goal is to show that We first show that are independent. Fix , and let . As are independent, we have Define implying that the set which independent to . We want to show that .

Note that as is independent with . As is a system, by $\pi-\lambda$ theorem theorem, it suffices to show that is a system, which will imply .

Now check the three properties of $\lambda-$system system:

since since .
Assume and . Then for , we have thus .
Assume where . Then since , we have thus .

Thus is a system, and therefore . Therefore, implying that are independent.

Now, fix and repeat the same argument for . Then, by the previous argument, are independent.

By repeating for times, we have the result of independent. □

Corollary (sufficient condition for independence).

The sufficient condition for the independence of random variables is

Proof.Let the sets for all . Then is a system since for any , By assumption, are independent and by ^8de7e2 Theorem 40 (independence of generated sigma-field), are independent.

Then, it is left to show that :

() Note that by Measure Theoretic Preliminaries > Definition 34 (function-generated sigma-field), and is the smallest field containing , thus .
() Assume , where . Then it can be represented as a countable unions, intersections, or complements using , since the inverse image preserves the set operations. As field is closed under those operations, we have . Thus .

Therefore, by ^74a6bc Definition 36 (independence between finite elements), are independent. □

Exercise (4.4 independent condition from density function).

Suppose the random variables has density that is Then, are independent if where are measurable.

Proof.Put Then Thus we have and by ^16d14b Corollary 41 (sufficient condition for independence), are independent. □

Exercise (4.5 sufficient condition for independence 2).

Suppose are random variables that take values in countable sets . Then in order for to be independent it is sufficient that whenever

Proof.

Corollary (independent function from independent random variable).

If are independent for , , for measurable functions , are independent.

Proof.First, we show that (1) the random vectors are independent, and then show that (2) the measurable functions are independent.

By assumption, are independent: (1) It is sufficient to show that are independent. Here, random vectors are mapping where Since are -measurable and a -field is closed under intersections, we have

Now consider the collection of sets: Then it is a -system since, for , as because is closed under intersection.

Now defineThen since for : (Note that are independent since are independent, by letting for all , from the definition.)

Finally, it is left to show that is a -system.

since: for , .
Let and , for fixed , . We have and Thus, we have meaning that .
let s.t. : i.e., for fixed . Then, thus, .

Therefore, as is a system containing , by the Measure Theoretic Preliminaries > Theorem 15 (Dynkin's pi-lambda theorem), .

Here, as and , is independent to by definition of . By iterating this for times, are independent.

(2) Since are independent random vectors, for any , and since are measurable, for all .
Thus, we have meaning that are independent. □

Random Vector of Independent Variables

Theorem (distribution of random vector).

Suppose independent random variables for with distribution distribution (i.e. , ). Then the random vector has distribution , i.e. , .

Proof.Define which is a $\pi-$system system, by Measure Theoretic Preliminaries > Remark 70 (rectangle is pi-system).

Then for , by the independence of .

Let then where is a system.

Now, we show that is a $\lambda-$system system:

, since .
Let where . Then as and
Let where . Then from for all , by taking limit on the both sides, we have

Thus is a system containing system, thus we have by Measure Theoretic Preliminaries > Theorem 15 (Dynkin's pi-lambda theorem). Thus for every measurable sets in can be expressed as the form in , which completes the proof.□

Theorem (expectation of random vector).

Let be the independent random variables having distributions and . If is a measurable function with or , then we have

Proof.Put . Then by Measure Theoretic Preliminaries > Theorem 74 (uniqueness of product measure), the distribution of is .

By ^f8829f Theorem 28 (change of the variable formula) and Measure Theoretic Preliminaries > Theorem 75 (Fubini theorem), we have where is measurable. □

Corollary (product of two random variables).

From ^0285bf Theorem 46 (expectation of random vector), let where are measurable functions. If or , then we have

Proof.From the result of ^0285bf Theorem 46 (expectation of random vector), we have which completes the proof. □

Corollary (product of finite random variables).

If are independent and have or for all , then we have

Proof.Let and . Since are independent, are independent by ^a87c44 Corollary 44 (independent function from independent random variable). Thus by letting and , we have . Then by induction, we have If , then , thus the previous result proves this.
Otherwise, if , then by ^24c1e0 Corollary 47 (product of two random variables), we have by induction. □

Convolution and Independence

Theorem (convolution).

Let and be independent random variables and their distributions be . Then the convolution of and is the distribution of and derived as

Proof.Let from ^0285bf Theorem 46 (expectation of random vector), where is measurable since is measurable (Measure Theoretic Preliminaries > Theorem 37 (measurability under various operations)) and indicator function is measurable (Measure Theoretic Preliminaries > Proposition 44 (measurability of indicator function)).

Then for the fixed , we have which completes the proof.□

Remark (convolution is probability measure).

A convolution is a probability measure.

Proof.We check the followings:

since
Let be the sequence of disjoint sets. Then using , we have where the third equation is by the linearity from Measure Theoretic Preliminaries > Lemma 52 (properties of integration on simple function).

Thus is a probability measure. □

Definition (random walk and convolution).

Let be independent random variables with distributions . Then the random walk is defined as with distribution .

Theorem (convolution distribution function).

Let be the independent random variables with distributions , and the distribution function . Then the distribution function of is

Proof.Let from ^ee2ef0 Theorem 49 (convolution), then we have which is a . □

Theorem (convolution and density function).

Let be independent random variables with distributions and suppose that has density function . Then, has probability density function of

Proof.From ^b72ae6 Theorem 52 (convolution distribution function), we have where the last equation holds by Measure Theoretic Preliminaries > Theorem 75 (Fubini theorem), as is non-negative and measurable since ^73d047 Definition 12 (probability density function). Thus, by letting is the probability density function of . □

Example (convolution of gamma density).

The gamma density with parameters is defined as where . Show that if and are independent, then is .

Proof.Using ^052ef4 Theorem 53 (convolution and density function), we have and using ^f8829f Theorem 28 (change of the variable formula), for and , we have Now by multiplying and integrating from to , we get and Thus we have , implying that . Thus which follows . □