Inferences in Linear Regression

#econometrics #economics

Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.
2024 Spring, instructed by prof. Kim, Dukpa.


Main References

  • Kim, Dukpa. (2024). "Econometric Analysis" (2024 Spring) ECON 518, Department of Economics, Korea University.
  • Davidson and MacKinnon. (2021). "Econometric Theory and Methods", Oxford University Press, New York.

Normality

Assumption (assumption-Normality).
  • A-N) the errors are normally distributed. i.e. and
Theorem (distribution of fitted value and residual).

Under A1~A5 and A-N, we have the following results:

  1. .
  2. .
  3. and are independent.

Proof.Remark that and . Also, by Assumption 1 (assumption-Normality), we have

  1. As and , we have
  2. As and , we have
  3. We use Normal Distribution Theory > Lemma 5 (independent between matrices normal) for the proof.

This completes the proof.

Theorem (distribution of least square estimates).

Under A1~A5 and A-N, we have the following results:

  1. .
  2. .
  3. and are independent.
  1. Note that then by , we have
  2. Note that and Since is symmetric and idempotent, by Introductory Linear Algebra > Theorem 6 (decomposition of symmetric and idempotent matrix), we can decompose the matrix as where is matrix with the first eigenvectors of corresponding to eigenvalue , and . Thus we have Remark that Therefore, we have by Normal Distribution Theory > Lemma 9 (multivariate normal and chi-squared distribution).
  3. Note that and . We now use Normal Distribution Theory > Lemma 5 (independent between matrices normal), by showing Therefore, and are independent, resulting in the independency between and .

This completes the proof.

Single-Hypothesis Test

The t-test follows the hypothesis:

  • (null hypothesis):
  • (alternative hypothesis):

where is given by the researcher. The t-test check whether a coefficient is significantly different from .

Theorem (t-test).

Under A1~A5 and A-N, we have where

Proof.From Theorem 3 (distribution of least square estimates), we have Now let denote the -th element of . Then,

Remark (decomposition of t-stat).

Note that where the first term follows and the second term is under and under . Thus, while the -statistics around is likely to be generated under the null hypothesis, if it is significantly different from , it is likely to be from the alternative hypothesis.

Definition (confidence interval).

A confidence interval for is constructed as where denotes the quantile of a Student's t distribution with degrees of freedom.

Given the significance level , the followings are defined

not reject reject
is true Good (A) Type I Error (B)
is true Type II Error (C) Good (D)
  • Type I error: the probability of rejecting when is correct.
  • Type II error: the probability of not rejecting when is correct.
  • Size of the test: probability of rejecting , .
  • Power of the test: probability of not rejecting , .

Note that the significance level is often set as either , or .

Definition (p-value).

A p-value is a probability of obtaining a test result at least as extreme as the observed case.

Joint-Hypothesis Test

The F-test follows the hypothesis:

  • (null hypothesis):
  • (alternative hypothesis):

where is matrix and is vector given by the researcher. The t-test check whether the estimated is significantly different from .

Theorem (f-test).

Under A1~A5 and A-N, we have where

Note that and is independent as and are independent each other by Theorem 2 (distribution of fitted value and residual).

Remark (alternative expression of F-stat).

F statistic can be alternatively expressed as where denotes the restricted sum of squared residuals, and denotes the unrestricted one.

Remark (f-test and t-test).

If -test is performed for the single restriction, then it can expressed as a -test using Normal Distribution Theory > Proposition 17 (f and t distribution), i.e. since .