# Maximum Likelihood Estimation

# The Likelihood Function

The probability density function, or pdf, for a random variable $y$ , conditioned on a set of parameters $θ$ is denoted $f (y | θ) .$

The joint density, or likelihood function is

𝑓 (𝑦_{1}, \dots, 𝑦_{𝑛} | θ) = \prod_{i = 1}^{𝑛} 𝑓 (𝑦_{𝑖} | θ) = 𝐿 (θ | 𝑦) .

It is usually simpler to work with the log of the likelihood function:

\ln 𝐿 (θ | y) = \sum_{i = 1}^{n} \ln 𝑓 (𝑦_{𝑖} | θ) .

MLE estimation: choose $θ$ to maximize log likelihood function.

The necessary condition is

\frac{\partial \ln 𝐿 (θ | 𝑦, 𝑋)}{\partial θ} = 0,

which is called the likelihood equation.

# Properties of MLE

Under regularity, the maximum likelihood estimator (MLE) has the following asymptotic properties:

Consistency: $plim \hat{θ} = θ_{0}$ .
Asymptotic normality: $\hat{θ} \overset{a}{\sim} N [θ_{0}, {𝐼 (θ_{0})}^{- 1}]$ , where $𝐼 (θ_{0}) = - 𝐸_{0} [\frac{\partial^{2} \ln 𝐿}{\partial θ_{0} \partial θ_{0}^{'}}] .$
Asymptotic efficiency: $\hat{θ}$ is asymptotically efficient and achieves the Cramer Rao lower bound for consistent estimators.

For each observations, we have log-density $\ln 𝑓 (𝑦_{𝑖} | θ)$ .Denote $𝑔_{𝑖} = \frac{\partial \ln 𝑓 (𝑦_{𝑖} | θ)}{\partial θ}$ and $𝐻_{𝑖} = \frac{\partial^{2} \ln 𝑓 (𝑦_{𝑖} | θ)}{\partial θ \partial θ^{'}}$ , $i = 1, \dots, n$ . Then we have

𝑔 = \frac{\partial \ln 𝐿 (θ | 𝑦)}{\partial θ} = \sum_{i = 1}^{𝑛} \frac{\partial \ln 𝑓 (𝑦_{𝑖} | θ)}{\partial θ} = \sum_{i = 1}^{𝑛} 𝑔_{𝑖},

and

Hessian Matrix 𝐻 = \frac{\partial^{2} \ln 𝐿 (θ | 𝑦)}{\partial θ \partial θ^{'}} = \sum_{𝑖 = 1}^{𝑛} \frac{\partial^{2} \ln 𝑓 (𝑦_{𝑖} | θ)}{\partial θ \partial θ^{'}} = \sum_{𝑖 = 1}^{𝑛} 𝐻_{𝑖} .

# Information Matrix Equality

Information matrix equality:

Var [\frac{\partial \ln 𝐿 (θ_{0} | 𝑦)}{\partial θ_{0}}] = 𝐸_{0} [(\frac{\partial \ln 𝐿 (θ_{0} | 𝑦)}{\partial θ_{0}}) (\frac{\partial \ln 𝐿 (θ_{0} | 𝑦)}{\partial θ_{0}^{'}})] = - 𝐸_{0} [\frac{\partial^{2} \ln 𝐿 (θ_{0} | 𝑦)}{\partial θ_{0} \partial θ_{0}^{'}}]

# Consistency

Let $θ_{0}$ be the true value of the parameter, $\hat{θ}$ is the MLE, and $θ$ be any other estimator for $Θ$ . Then MLE $\hat{θ}$ is consistent, i.e.,

plim \hat{θ} = θ_{0} .

# Asymptotic Normality

The MLE $\hat{θ}$ has an asymptotic normal distribution,

\sqrt{n} (\hat{θ} - θ_{0}) \overset{d}{\to} 𝑁 [0, {{- 𝐸_{0} [\frac{1}{𝑛} 𝐻 (θ_{0})]}^{- 1}}],

so we have

\hat{θ} \overset{a}{\sim} 𝑁 [θ_{0}, {𝐼 (θ_{0})}^{- 1}] .

# Asymptotic Efficiency

Cramer-Rao Lower Bound

Assuming that the density of $𝑦_{𝑖}$ satisfies the regularity conditions, the asymptotic variance of a consistent and asymptotically normally distributed estimator of the parameter vector $θ_{0}$ will always be at least as large as

[𝐼 (θ_{0})]^{- 1} = (- 𝐸_{0} [\frac{\partial^{2} \ln 𝐿 (θ_{0})}{\partial θ_{0} \partial θ_{0}^{'}}])^{- 1} .

# Hypothesis Test

Likelihood Ratio Test:

- 2 \ln \frac{{\hat{L}}_{R}}{{\hat{L}}_{U}} \sim χ^{2} (J) .

Wald Test:

W = [𝑐 (\hat{θ}) - 𝑞]^{'} (Asy . Var [c (\hat{θ}) - 𝑞]^{- 1} [𝑐 (\hat{θ}) - 𝑞] \sim χ^{2} (𝐽) .

Lagrange Multiplier Test:

LM = (\frac{\partial \ln L ({\hat{θ}}_{𝑅})}{\partial {\hat{θ}}_{𝑅}})^{'} [𝐼 ({\hat{θ}}_{𝑅})]^{- 1} (\frac{\partial \ln 𝐿 ({\hat{θ}}_{𝑅})}{\partial {\hat{θ}}_{𝑅}})^{'} \sim χ^{2} (J),

where $J$ is the number of restrictions.

← Panel Data Models Discrete Choices →