Estimation of AR Models

#econometrics #economics #timeseries

Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.


Main References

  • Kim, Dukpa. (2022). "Time Series Econometrics" (2022 Fall) ECON 512, Department of Economics, Korea University.
  • Hamilton, J. D. (1994). "Time Series Analysis". Princeton University Press.

Yule–Walker equations

Proposition (Yule-Walker equations).

Consider an models without constant term: then the Yule-Walker equation is one of the method to estimate the coefficients by where is the sample autocovariance function.

Proof.Since the model does not contain any constant term, we can derive the autocovariance function directly. First, by multiplying on the both sides, and by taking expectations, By Stationary Stochastic Processes > Definition 6 (autocovariance function), we therefore have by assuming that is uncorrelated to and . Similarly, for , we have Therefore, we have where which is a variance-covariance matrix and positive semidefinite. Therefore, we can obtain and using the sample autocovariance, we have which completes the proof.

Using Least Squares Estimator

Consider an models: where is a white noise and is uncorrelated with for .

Note that in matrix form, we have then we can obtain Given Autoregressive Moving Average Models > Definition 5 (causality of ARMA) condition, i.e. the every roots of lies outside the unit circle, the OLS estimator is asymptotically normally distributed.

Asymptotic Normality of OLS in AR model

The detailed explanation for the asymptotic theory will be discussed in Asymptotic Theory on Time Series Models later on.

Using Maximum Likelihood Estimator

If the joint distribution of can be obtained, we can also apply the Maximum Likelihood Estimation.

Consider an model: for , where . Now let , then the MLE is where is the joint distribution of .

Using the definition of conditional density, we have where Here, the maximizer that uses this conditional joint density function is a conditional maximum likelihood estimator, defined as The conditional MLE is intuitive to derive, since given the past values, we have assuming that . Thus, we have Similar to Maximum Likelihood Estimation > MLE of Normal Distribution, we have and Thus the FOC equals to zero for and which result equals to Using Least Squares Estimator.

Now consider for the full MLE, the previously defined as where Assuming the first initial values are multivariate normal of Now, expressing in terms of , and denoting the variance-covariance matrix as , we obtain where for .

Note that for , we have However, this method is of high computational cost compared to other conditional approaches.

Estimation of MA Models

Consider an model: where and the roots of lies outside the unit circle.

Then, for out goal is to obtain the joint density of from the given distribution of which is a full MLE. However, since it is difficult to derive the exact MLE, we first derive the conditional MLE and then discuss the full MLE later.

Before moving on, we introduce some useful theorem:

Theorem (transformation of pdf).

Let has pdf for all and let , where is a monotone function. If is continuous on and has a continuous derivative on , then the pdf of is given by

Conditional MLE for MA model

Let the value of the initial errors to zero, i.e. Then we have which equals to or simply Note that the matrix is a lower triangular and invertible, then Since the joint pdf of is given by and by Theorem 2 (transformation of pdf), we have Then the log likelihood function is given as For the estimation, we first concentrate out to obtain and compute by iterating on for . The MLE can be obtained using numerical optimization techniques.

Exact MLE for MA model

Now we consider the model that incorporates as non-zero. From we have where

By lettingwe can alternatively express the model as where we define and we have the equality of Thus we have and Then we have and since by Theorem 2 (transformation of pdf), we have Now, given the joint density distribution , we need to derive the conditional density Given the equation we have then we can estimate using OLS by and Note that, as we have Thus, the conditional distribution of is given as since only depends on the model parameter.

Remark that and Then, the conditional density function of can be driven by and the exact MLE can be obtained by maximizing .

Estimation of ARMA Model

For an model, where , we use the mixture of those for and models.

Conditional on the initial observations and for , the sequence of for can be calculated by iterating on Note that the conditional log likelihood function for a given set of parameter values is and obtain MLE using the numerical optimization techniques.

Note that the Maximum Likelihood Estimation > Theorem 8 (asymptotic normality of MLE) is still in effect and the standard errors of the parameter estimates can be obtained from the Hessian matrix.

Selection of the Order

While the goodness of fit is naturally used for the order-selecting criteria, in model, the length of the sample varies by the orders (number of parameters). Hence, we look for the goodness of fit penalized by the number of parameters.

Consider an model for : and suppose that the true model is given by and for some non-zero constants . Here, we assume that .

Definition (information criterion for AR model).

The information criterion for each is computed by where is a constant that might depend on the sample size and is the estimator for error variance .

Remark (AIC and BIC).

The two most popular criteria are given as follows:

  • AIC (Akaike Information Criterion):
  • BIC (Bayesian Information Criterion):

The value of that minimizes is obtained by Now, let the notations as follows: Then, the variance estimator is given by where the true variance estimator is

First, suppose , and define where . Then, from using Properties of Least Squared Estimator > Theorem 8 (Frish-Waugh-Lovell Theorem), we have where It follows that where the last equality holds by Then, we have where , since it has the quadratic form.

In case of AIC, , then then is not guaranteed. However, in case of BIC, we have and implying that is asymptotically guaranteed as .

Now suppose , and define where . Then we have where since it is a quadratic form.

Therefore, we have since in either AIC or BIC, the second term converges to zero as .