Basic Definitions in Probability

Random Variables

Probability Space

Definition 16 (measure).

A set function is called a measure if for any countable collection of disjoint measurable sets , Additionally, we may add another condition for the lower bound: Furthermore, if , then it is a probability measure, denoted by .

Definition 17 (measure space).

Let be a given set, be a -field on , and be a measure. The triple is called measure space. If is a probability measure, then is called probability space.

Random Variables

Definition 30 (measurable function).

Let be a measurable space, and be a topological space. Then is -measurable if In particular, if , then is Borel measurable.

Definition (random variable).

Let be a -measurable function.

  • if , then is a random variable.
  • if , then is a random vector.
Theorem (measurability of random vector).

If are -measurable and is Borel-measurable, then is -measurable.

Remark that from Measure Theoretic Preliminaries > Theorem 10 (Borel sigma-field on euclidean space), we have is -measurable if for all , or . Now let be the Borel sets in . Then we havesince is closed under countable intersection.

Therefore is -measurable, thus is -measurable.

Distribution Function

Definition (distribution).

Let be a probability space, and be an -measurable function. Then for any , define Then is called a distribution of .

Proposition (distribution is measure).

Let be a distribution of . Then is a probability measure on .

Proof.Trivially, we have Now, consider a sequence of disjoint sets , then we have Therefore, is a probability measure.

Definition (identical distribution).

Let and be both measurable functions. We say and are equal in distribution and denote as if their distributions are identical, i.e.,

Definition (distribution function).

Let be a measurable function, and let for be the distribution of . Then the function is called a distribution function.

Theorem (properties of distribution function).

Let be any distribution function. Then, we have the following properties:

  1. is non-decreasing.
  2. , and .
  3. is right-continuous, i.e., .
  4. If , then .

Proof.

  1. Let , then by Definition 6 (distribution function),Since , by the monotonicity of the measure in Measure Theoretic Preliminaries > Theorem 18 (properties of measure), we have

  2. Note that we have Let be a sequence such that . Then, we have .
    Thus, Similarly, by letting , we have . Then,

  3. Let , then . Then,

  4. Let , then , thus

  5. Notice that , then we have Then, by (3) and (4), we haveThis completes the proof.

Remark (sufficient condition for continuous distribution function).

Note that from Theorem 7 (properties of distribution function), we can conclude that is continuous if and only if .

Theorem (sufficient condition for distribution function).

If a function satisfies the following conditions,

  1. is non-decreasing
  2. ,
  3. is right continuous

then is a distribution of some random variable.

Proof.Here, we need to construct a probability space , and a random variable such that Without loss of generality, let , , and be a Lebesgue measure. Then, since , is a probability space.

Now defineWe claim: If holds, thenwhich completes the proof.

Proof of claim:
() Let , then . Thus, is an upper bound of . As we have defined , we therefore have .

() We show the inverse by: If , then .
First, let , then since is right continuous, This gives us , as is a supremum.

Proposition (distribution function is almost surely continuous).

Let be a distribution function. is discontinuous at most countably many points.

Proof.Note that Remark 8 (sufficient condition for continuous distribution function), if is discontinuous at then . Thus there exists some such that for any discontinuous points . Then, if there are more than discontinuous points, we have , which contradicts Theorem 7 (properties of distribution function). Thus has at most discontinuous points.

Remark (distribution function always exists).

For a given random variable, its distribution function (or cumulative distribution function) always exists by Measure Theoretic Preliminaries > Theorem 18 (properties of measure) and Theorem 9 (sufficient condition for distribution function).

Density Function

Definition (probability density function).

Let be a distribution function of . Suppose which is Borel-measurable on , such thatthen such is called a probability density function of .

Example (uniform distribution).

Uniform distribution on :The distribution is

Example (exponential distribution).

Exponential distribution:The distribution is

Remark (memoryless).

Note that a random variable is said to have a memoryless property if has memoryless property if and only if has an exponential distribution.

Example (standard normal distribution).

Standard normal distribution: Normal distribution with mean and variance ,

Expected Values

Definition (expected value).

Let be a random variable on . Then the expected value of is defined as where is the set of non-negative simple functions.
If is any random variable, for in Measure Theoretic Preliminaries > Definition 42 (positive and negative parts of function), its expected value is provided that or .

Remark (absolute value and integrability).

Let be a random variable and let the absolute value of random variable be . Then

Theorem (properties of expectation).

Let be random variables on , and suppose or . Then we have

  1. .
  2. for any .
  3. If , then .

By Basic Definitions in Probability > Remark 18 (absolute value and integrability), we have . Now put by rearranging the term, we have as each terms is non-negative, using the result in (1.1), we have thus

If or , then use the proof similar to (2.2).

(2.2) Assume , then we have .

If , then and . Thus If , then and . Thus If , then and . Otherwise, if , then and . Thus in either case, we have Then using the result in (1.2), we have .

(3.1) Assume and . We use the monotonicity in Measure Theoretic Preliminaries > Theorem 54 (properties of integral 1). Let be any simple function such that , then . Thus we have meaning that is an upper bound of the set . Thus by Definition 17 (expected value), we have (3.2) Assume where . Then where all terms are non-negative. Thus applying (3.1) and (1.1), we have which gives us .

Definition (variance).

Let be a random element defined on . Then the variance of is defined as

Remark (computing variance).

If , then the variance of can be computed as

Proof.From Definition 20 (variance), we have where the linearity of expectation of Theorem 19 (properties of expectation) can be applied since .

Corollary (linear transformation in variance).

For any , we have .

Proof.As we already have by Theorem 19 (properties of expectation), we have This completes the proof.

Inequalities

Remark (facts on convex function).

Let be a convex function. Then the following results are given as facts:

  1. is continuous
  2. is almost everywhere differentiable
  3. , such that .
Proposition (Jensen's inequality).

Suppose is convex, i.e. If , then

Proof.As for some , we have which completes the proof.

Remark (applications of Jensen's inequality).

By letting , we have and similarly, for for , we have or

Proposition (Chebyshev's inequality).

Suppose such that . For and , we have

Proof.From the definition of , we have and by taking expectation, by the third of Theorem 19 (properties of expectation), we have the desired result.

Remark (common form of Chebyshev).

By letting in Proposition 26 (Chebyshev's inequality), for , we have

Proof. Let By integration the both sides, since , meaning , we have which completes the proof.

Change of Variable Formula

Theorem (change of the variable formula).

Let be a random element and be a measurable function. Denote as a distribution of on , i.e. . If or (), then

Proof.(case 1) First we consider the case of an indicator function, where Then recalling Definition 3 (distribution), we have

(case 2) Now we consider a simple function, where Then, by the case 1 and the linearity of Theorem 19 (properties of expectation),

(case 3) Next, let be a non-negative measurable function. Now take be measurable simple functions that . Then, by the case 2, we have and by Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), we have (case 4) Lastly, consider an integrable function , such that . Then, by Remark 18 (absolute value and integrability), for , we have . Then by the case 3 and the linearity of Theorem 19 (properties of expectation), we have This completes the proof.

Corollary (expectation of composition).

Let and be measurable functions, and let the probability measure of and its density function as , meaning where is a lebesgue measure. If or , then we have

Proof.Without loss of generality, let for some . Then by Theorem 28 (change of the variable formula) and Definition 12 (probability density function), we have

Then, for a simple function , from the case of simple function and the linearity of Theorem 19 (properties of expectation), we have

Next, for a non-negative measurable function , consider measurable simple functions such that . Then, by Measure Theoretic Preliminaries > Theorem 57 (monotone convergence theorem, MCT), Lastly, for an integerable function where , we have by Measure Theoretic Preliminaries > Theorem 67 (properties of integrability).

Computing Expected Values

Remark (computing expected value).

By Corollary 29 (expectation of composition), the expected value of is computed by and the kth moment of is computed as Moreover, the variance can also be computed as

Example (moments of exponential distribution).

Let have exponential distribution where its pdf is Then, the kth moment of is

Proof.First assume when . Using Remark 30 (computing expected value), we have Thus , and .

Now consider a general case when . Then, we have Thus , and .

Example (moments of standard normal distribution).

Let have standard normal distribution where its pdf is Then it has Furthermore, if , then we have and .

Proof.We have and The case of can be driven straightforwardly from Theorem 19 (properties of expectation) and Remark 21 (computing variance).

Example (moments of poisson distribution).

Let have a poisson distribution with parameter where Then we have

Proof.Remark that Now we first obtain the first moment: Next, consider following: Then, we get the variance where the second moment is .

Independence

Definition (independence between pairwise elements).

Let be a probability space.

  1. Two events are independent if
  2. Two random variables are independent if for any ,
  3. Two fields are independent if , the events and are independent.
Exercise (independency of random variables and sigma-fields).

Let be a probability space. Assume be random variables and be fields on .

  1. are independent if and only if are independent, where are Measure Theoretic Preliminaries > Definition 34 (function-generated sigma-field).
  2. If are independent and , then are independent.

Proof.(1) This comes directly from definition. (2) Let be independent. Then Now denote be any borel sets. Then by Measure Theoretic Preliminaries > Definition 30 (measurable function), we have and . Thus This completes the proof.

Definition (independence between finite elements).

Let be a probability space.

  1. fields are independent if
  2. Random variables are independent if the generated fields are independent, i.e.
  3. Events are independent if
Example (pairwise and all independence).

Let are random variables with Then for , they are pairwise independent while not independent for all three.

Proof.We first show that , : and as for all . We thus have , meaning that are pairwise independent.

However, are not independent since This completes the proof.

Sufficient Condition for Independence

Definition (independent collection of sets).

Let be a probability space. Then the collection of sets are independent if whenever , and , we have

Remark (implication of independency).
  1. Suppose , . Then
  2. If are independent, then so are .

Proof.(1) Suppose for all and . As we can let for , we have . This method can also be applied to the reverse.

(2) This can be shown using the same logic of (1).

Recall the theorem:
Definition 11 (pi and lambda system).

Let be an arbitrary set and be a collection of subsets of .

  1. is called a system if it is closed under intersection:
  2. is called a system if the followings hold:
    1. .
    2. for any such that , we have .
    3. for any sequence such that for all , we have .
Theorem 15 (Dynkin's pi-lambda theorem).

Let be a system on and be a system on that contains . Then .

Theorem (independence of generated sigma-field).

Suppose are independent and each is a system. Then are independent.

Proof.Our goal is to show that We first show that are independent. Fix , and let . As are independent, we have Define implying that the set which independent to . We want to show that .

Note that as is independent with . As is a system, by theorem, it suffices to show that is a system, which will imply .

Now check the three properties of system:

  1. since since .
  2. Assume and . Then for , we have thus .
  3. Assume where . Then since , we have thus .

Thus is a system, and therefore . Therefore, implying that are independent.

Now, fix and repeat the same argument for . Then, by the previous argument, are independent.

By repeating for times, we have the result of independent.

Corollary (sufficient condition for independence).

The sufficient condition for the independence of random variables is

Proof.Let the sets for all . Then is a system since for any , By assumption, are independent and by Theorem 40 (independence of generated sigma-field), are independent.

Then, it is left to show that :

Exercise (4.4 independent condition from density function).

Suppose the random variables has density that is Then, are independent if where are measurable.

Proof.Put Then Thus we have and by Corollary 41 (sufficient condition for independence), are independent.

Exercise (4.5 sufficient condition for independence 2).

Suppose are random variables that take values in countable sets . Then in order for to be independent it is sufficient that whenever

Proof.

Corollary (independent function from independent random variable).

If are independent for , , for measurable functions , are independent.

Proof.First, we show that (1) the random vectors are independent, and then show that (2) the measurable functions are independent.

By assumption, are independent: (1) It is sufficient to show that are independent. Here, random vectors are mapping where Since are -measurable and a -field is closed under intersections, we have

Now consider the collection of sets: Then it is a -system since, for , as because is closed under intersection.

Now defineThen since for : (Note that are independent since are independent, by letting for all , from the definition.)

Finally, it is left to show that is a -system.

  1. since: for , .

  2. Let and , for fixed , . We have and Thus, we have meaning that .

  3. let s.t. : i.e., for fixed . Then, thus, .

Here, as and , is independent to by definition of . By iterating this for times, are independent.

(2) Since are independent random vectors, for any , and since are measurable, for all .
Thus, we have meaning that are independent.

Random Vector of Independent Variables

Theorem (distribution of random vector).

Suppose independent random variables for with distribution (i.e. , ). Then the random vector has distribution , i.e. , .

Then for , by the independence of .

Let then where is a system.

Now, we show that is a system:

  1. , since .
  2. Let where . Then as and
  3. Let where . Then from for all , by taking limit on the both sides, we have

Thus is a system containing system, thus we have by Measure Theoretic Preliminaries > Theorem 15 (Dynkin's pi-lambda theorem). Thus for every measurable sets in can be expressed as the form in , which completes the proof.

Theorem (expectation of random vector).

Let be the independent random variables having distributions and . If is a measurable function with or , then we have

Corollary (product of two random variables).

From Theorem 46 (expectation of random vector), let where are measurable functions. If or , then we have

Proof.From the result of Theorem 46 (expectation of random vector), we have which completes the proof.

Corollary (product of finite random variables).

If are independent and have or for all , then we have

Proof.Let and . Since are independent, are independent by Corollary 44 (independent function from independent random variable). Thus by letting and , we have . Then by induction, we have If , then , thus the previous result proves this.
Otherwise, if , then by Corollary 47 (product of two random variables), we have by induction.

Convolution and Independence

Theorem (convolution).

Let and be independent random variables and their distributions be . Then the convolution of and is the distribution of and derived as

Then for the fixed , we have which completes the proof.

Remark (convolution is probability measure).

A convolution is a probability measure.

Proof.We check the followings:

Thus is a probability measure.

Definition (random walk and convolution).

Let be independent random variables with distributions . Then the random walk is defined as with distribution .

Theorem (convolution distribution function).

Let be the independent random variables with distributions , and the distribution function . Then the distribution function of is

Proof.Let from Theorem 49 (convolution), then we have which is a .

Theorem (convolution and density function).

Let be independent random variables with distributions and suppose that has density function . Then, has probability density function of

Proof.From Theorem 52 (convolution distribution function), we have where the last equation holds by Measure Theoretic Preliminaries > Theorem 75 (Fubini theorem), as is non-negative and measurable since Definition 12 (probability density function). Thus, by letting is the probability density function of .

Example (convolution of gamma density).

The gamma density with parameters is defined as where . Show that if and are independent, then is .

Proof.Using Theorem 53 (convolution and density function), we have and using Theorem 28 (change of the variable formula), for and , we have Now by multiplying and integrating from to , we get and Thus we have , implying that . Thus which follows .