# Maximum Likelihood Estimation

# The Likelihood Function

The probability density function, or pdf, for a random variable y, conditioned on a set of parameters θ is denoted f(y|θ).

The joint density, or likelihood function is

𝑓(𝑦1,,𝑦𝑛|θ)=i=1𝑛𝑓(𝑦𝑖|θ)=𝐿(θ|𝑦).

It is usually simpler to work with the log of the likelihood function:

ln𝐿(θ|y)=i=1nln𝑓(𝑦𝑖|θ).

MLE estimation: choose θ to maximize log likelihood function.

The necessary condition is

ln𝐿(θ|𝑦,𝑋)θ=0,

which is called the likelihood equation.

# Properties of MLE

Under regularity, the maximum likelihood estimator (MLE) has the following asymptotic properties:

  1. Consistency: plimθ^=θ0.
  2. Asymptotic normality: θ^aN[θ0,{𝐼(θ0)}1], where𝐼(θ0)=𝐸0[2ln𝐿θ0θ0].
  3. Asymptotic efficiency: θ^ is asymptotically efficient and achieves the Cramer Rao lower bound for consistent estimators.

For each observations, we have log-density ln𝑓(𝑦𝑖|θ).Denote 𝑔𝑖=ln𝑓(𝑦𝑖|θ)θ and 𝐻𝑖=2ln𝑓(𝑦𝑖|θ)θθ, i=1,,n. Then we have

𝑔=ln𝐿(θ|𝑦)θ=i=1𝑛ln𝑓(𝑦𝑖|θ)θ=i=1𝑛𝑔𝑖,

and

Hessian Matrix 𝐻=2ln𝐿(θ|𝑦)θθ=𝑖=1𝑛2ln𝑓(𝑦𝑖|θ)θθ=𝑖=1𝑛𝐻𝑖.

# Information Matrix Equality

Information matrix equality:

Var[ln𝐿(θ0|𝑦)θ0]=𝐸0[(ln𝐿(θ0|𝑦)θ0)(ln𝐿(θ0|𝑦)θ0)]=𝐸0[2ln𝐿(θ0|𝑦)θ0θ0]

# Consistency

Let θ0 be the true value of the parameter, θ^ is the MLE, and θ be any other estimator for Θ. Then MLE θ^ is consistent, i.e.,

plimθ^=θ0.

# Asymptotic Normality

The MLE θ^ has an asymptotic normal distribution,

n(θ^θ0)d𝑁[0,{𝐸0[1𝑛𝐻(θ0)]1}],

so we have

θ^a𝑁[θ0,{𝐼(θ0)}1].

# Asymptotic Efficiency

Cramer-Rao Lower Bound

Assuming that the density of 𝑦𝑖 satisfies the regularity conditions, the asymptotic variance of a consistent and asymptotically normally distributed estimator of the parameter vector θ0 will always be at least as large as

[𝐼(θ0)]1=(𝐸0[2ln𝐿(θ0)θ0θ0])1.

# Hypothesis Test

Likelihood Ratio Test:

2lnL^RL^Uχ2(J).

Wald Test:

W=[𝑐(θ^)𝑞](Asy.Var[c(θ^)𝑞]1[𝑐(θ^)𝑞]χ2(𝐽).

Lagrange Multiplier Test:

LM=(lnL(θ^𝑅)θ^𝑅)[𝐼(θ^𝑅)]1(ln𝐿(θ^𝑅)θ^𝑅)χ2(J),

where J is the number of restrictions.