EM Algorithm

#econometrics #economics #algorithm #optimization

Oh, Hyunzi. (email: wisdom302@naver.com)
Korea University, Graduate School of Economics.


Main References

  • Kim, Seung Hyun. (2024). "Asset Pricing and Term Structure Models". WORK IN PROGRESS.
  • Hogg, McKean, and Craig. (2013). "Introduction to Mathematical Statistics" (8th Edition). Person.

Expectation-Maximization (EM) algorithm is an iterative method that is designed to find the local maxima of the log likelihood function.

We first denote and . The basic setup of the EM algorithm is as follows:

  • is the observed data, and is missing data. The both and make up the complete data .
  • We Assume a complete data have a density that is parametrized by . Since we are missing , we cannot evaluate .
  • The observed data has the density of and the observed-data log likelihood is .
  • Our goal is to maximize the complete log likelihood , by exploiting the observed-data log likelihood function .

First, denote the conditional pdf of unobserved-data as , and by the definition of the conditional pdf, we have Then, for some fixed , we can derive the following equality: Now, let the first term of the right hand side as and calculating this expectation is called as E-step of EM algorithm. The next step will be to maximize the calculated expectation, which is the M-step.

Summing up, we can define the EM algorithm as follows:

Definition (EM algotithm).

Given initial parameter value , the EM algorithm consists of two steps.

  1. E-Step: Expectation Step is basically calculating the expectation:
  2. M-Step: Maximization Step is basically updating the parameter estimates that maximizes with respect to :
  3. Convergence Criterion: The algorithm stops once it has converged according to the convergence criterion: where the criterion is used in Doz, Giannone, and Reichlin (2012).

In the most case, the initial parameter is obtained from the observed log likelihood function.

The proof for the convergence of the EM algorithm estimates will be separately discussed in Why EM Algorithm Works.

EM Algorithm for Censored Data

Example (EM algorithm for censored observations).

Let the observed data follows iid and with pdf and cdf , for and . Now suppose there exists some censored observations , and it is recognized to us that for some known value , and and are independent. Then the EM algorithm are driven as follows:

  1. E-step: calculate
  2. M-step: maximize and update the parameter as Especially, if the model follows censored normal distribution, for given the iteration estimates
    , the iteration estimate will be where is the cdf of standard normal distribution and .

Proof.Given the assumptions, the observed and complete likelihood functions are Then the conditional distribution of the unobserved-data given is thus we can see that and are independent and are iid following the pdf for .

Now, for some fixed , the expectation step follows as Next, to maximize the given expectation, we first need to obtain the partial derivative of . In general case, we can obtain by and if , the first iteration estimates of EM algorithm is the that makes .

Lastly, we obtain the analytical solution for the case when . Then we have and we can show that and By denoting as the cdf of standard normal distribution, the first derivative of the expectation is then, the estimate that satisfies given can be obtained as which completes the proof.

EM Algorithm for Mixed Normal

Example (EM algorithm for mixed normal).

Consider where with pdf , with pdf , and is the probability of success. The observed data is given as where .
In this case, the EM algorithm are driven as follows:

  1. E-step: calculate where
  2. M-step: maximize where the maximizers are driven as and

The solution is omitted, but one can refer to Hogg et al (2013) 422-423 pp for the proof.

EM Algorithm for SSM

Given a linear State-Space Model: where the constant part of the transition equation has been assumed to be . We also assume that the Kalman smoothed quantities have been obtained.

E-Step

From the E-Step: We first decompose :
Since and the two conditional distributions are independent, we have Thus, the expectation can be evaluated as Firstly, notice that where the third equation holds by

Secondly, we also have where Therefore, the expectation is computed as

M-Step

We want to maximize with respect to before finding the next iterates . And finding means maximizing with respect to

Derive

First, note that where the first equation holds by and . and the second equation holds since is vector, thus , and then applying .

Since the equates the above first order condition with .

Then we have and then we have Therefore, where the last equality holds since the both and are symmetric. By implication, we have

Derive

Likewise, from Thus the first derivative will be and the equates the above first order condition with .

Then, we have and Therefore, we have

Derive

Since equals zero under , we have

Derive

Since equals zero under , we have

Summing up M-step

Therefore, for iteration, we have To obtain a more tractable expressions, we let Then, we have and lastly,