Maximum Likelihood Estimation
Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.
2024 Spring, instructed by prof. Kim, Dukpa.
Main References
Assumptions
Here, the question is how to estimate the unknown parameter
The likelihood function of
Note that the function
Note that it is more convenient to denote in log form, which is called the log-likelihood function:
The function
The maximum likelihood estimator of the unknown parameter
This implies that the value of the density function evaluated at a point stands for the probability that a sample outcome under the value of the given point. Hence, the maximum likelihood estimate of
To find the maximum value of the given function, we often use the first and second derivatives of the log-likelihood function.
From the given log-likelihood function
Note that the both
The information matrix
Here, we put the negative sign since the Hessian will be negative when
The limit counterpart of the information matrix
Proof.
Note that the proof of ^b2c08eRemark 6 (LLM of information matrix) is similar to the proof of Asymptotic Results in Basic Linear Model > ^b315f9Asymptotic Results in Basic Linear Model > Remark 7 (A1** implies A1, A4, A5, A6 and A8).
The maximum likelihood estimates are not in general unbiased, and there finite sample distributions are not always normal. However, the following result of consistency and the asymptotic normality make inferences based on MLE useful and convenient.
Let
Proof.Let
Thus, we have
Let
Here, note that the limiting variance of the asymptotic normal distribution is the inverse of the information matrix. This implies that if we have bigger information (i.e. large value of information matrix), then the MLE estimator becomes more precise.
Proof.By taylor theorem, there exists some
Remark that
Then, we have
Let
Proof.Note that the (log) likelihood function can be derived as
(MLE estimator) Since
(asymptotic normality) Let
Thus we have
Let
Remark that
Proof.The (log-)likelihood function is
(score function) The first derivatives under the
Also, the first derivatives under the
Thus, the score function is
(Hessian matrix) Now, we look into the second derivatives.
Firstly, for the beta,
(information matrix)
(MLE estimator) From the first derivative,
(asymptotic normality) Since the
Note that this results can also be driven from