Maximum Likelihood Estimation

#econometrics #economics

Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.
2024 Spring, instructed by prof. Kim, Dukpa.


Main References

  • Kim, Dukpa. (2024). "Econometric Analysis" (2024 Spring) ECON 518, Department of Economics, Korea University.
  • Davidson and MacKinnon. (2021). "Econometric Theory and Methods", Oxford University Press, New York.

Model and Definitions

Assumptions

  • vector of observations:
  • the joint density of exists:
  • joint distribution of belongs to a certain family of distributions (e.x. normal)
  • distribution of the parameter is unknown.

Here, the question is how to estimate the unknown parameter from the observed data , i.e. finding the family of distribution of using the known distribution of .

Definition (likelihood function).

The likelihood function of is given as Note that if are independently and identically distributed (iid), then where .

Note that the function can be viewed as a random variable since it is a mapping of random variables into the real space, using their joint density function. Thus, the exact value of is not determined until the random variables realize.

Note that it is more convenient to denote in log form, which is called the log-likelihood function:

Remark (log-likelihood function).

The function are called log-likelihood function where where and is assumed to be iid.

Definition (MLE estimator).

The maximum likelihood estimator of the unknown parameter is the maximizer of the (log) likelihood funciton.

This implies that the value of the density function evaluated at a point stands for the probability that a sample outcome under the value of the given point. Hence, the maximum likelihood estimate of can be interpreted as the value of under which the given observations are most likely to occur.

To find the maximum value of the given function, we often use the first and second derivatives of the log-likelihood function.

Definition (score function and Hessian matrix).

From the given log-likelihood function , the score function and Hessian matrix is defined as a first and second derivatives of , respectively. This means

Note that the both and are a sum of terms and needed to be scaled properly for the asymptotic analysis.

Definition (information matrix).

The information matrix is defined as

Here, we put the negative sign since the Hessian will be negative when is maximized at the given . Also, if is iid, then we have which implies the law of large numbers.

Remark (LLM of information matrix).

The limit counterpart of the information matrix will be identical to

Proof.or, alternatively, we can show it by thus which is the desired result.

Asymptotic Normality

The maximum likelihood estimates are not in general unbiased, and there finite sample distributions are not always normal. However, the following result of consistency and the asymptotic normality make inferences based on MLE useful and convenient.

Consistency of MLE

Theorem (consistency of MLE).

Let be the maximum likelihood estimate for . Under some regularity conditions, we have

Proof.Let be the joint density function for . The (log-)likelihood function is Note that we have thus and Note that we have and by Econometric Analysis/Asymptotics > Theorem 1 (weak law of large numbers), since .

Thus, we have Also, for any sample size , by Definition 3 (MLE estimator), we have since is the maximizer of . This implies Therefore, we have , assuming that is not flat around .

Asymptotic Normality of MLE

Theorem (asymptotic normality of MLE).

Let be the maximum likelihood estimator for . Under some regularity conditions, we have

Here, note that the limiting variance of the asymptotic normal distribution is the inverse of the information matrix. This implies that if we have bigger information (i.e. large value of information matrix), then the MLE estimator becomes more precise.

Proof.By taylor theorem, there exists some between and such that then we have Now, ISTS that , since .

Example of MLE estimator

MLE of Exponential Distribution

Example (MLE of exponential distribution).

Let , implying that the joint distribution is Now, drive the MLE estimator of , which is , and show its consistency and asymptotic normality.

Proof.Note that the (log) likelihood function can be derived as The score function and the Hessian matrix can be driven as and for every . Furthermore, the information matrix is First we drive the MLE estimator of , and sequentially show its consistency and the asymptotic normality.

(MLE estimator) Since from FOC, we have implying that the MLE estimator is (consistency) Note that by the definition of the exponential function, from , we have Then the consistency of can be shown directly by where the probability limit holds by the Econometric Analysis/Asymptotics > Theorem 1 (weak law of large numbers).

(asymptotic normality) Let Then, from Econometric Analysis/Asymptotics > Theorem 8 (Lindeberg-Levy CLT), we have Note that which equals to the taylor expansion truncated at the first order.

Thus we have where, by the consistency of the , we have and by the previous conclusion, Finally, by the Convergence of Random Variables > Theorem 25 (Slutsky theorem), we have This shows that the has the asymptotic normal distribution.

MLE of Normal Distribution

Example (MLE of normal distribution).

Let where , and is non-random. Also, let be the vector of parameters of interest, and let be the true value. This implys that the joint distribution is Now, drive the MLE estimator of , which is , and show its consistency and asymptotic normality.

Remark that is treated as non-random for convenience in reason that where is the conditional distribution of given and is the marginal density. Then since all the parameters of interest lies on , and not on , so we can ignore the second part.

Proof.The (log-)likelihood function is

(score function) The first derivatives under the is where . By taking expectation, we have where the second equation holds since . This implies

Also, the first derivatives under the is By taking expectations under , We can see that the expectation of the score function under the true value is zero.

Thus, the score function is

(Hessian matrix) Now, we look into the second derivatives.
Firstly, for the beta, next, for the sigma-squared, and lastly, for the cross product, Thus the Hessian matrix is

(information matrix) then by letting , we have Note that for the true parameter , we have

(MLE estimator) From the first derivative, and Note that this is biased as discussed in Finite Sample Results > Proposition 21 (unbiased estimate of ). This shows that the MLE estimator is not always unbiased.

Note that this results can also be driven from since this completes the proof.