Maximum Likelihood Estimation The Likelihood Function The probability density function, or pdf, for a random variable y , conditioned on a set of parameters θ is denoted f ( y | θ ) .
The joint density, or likelihood function is
𝑓 ( 𝑦 1 , … , 𝑦 𝑛 | θ ) = ∏ i = 1 𝑛 𝑓 ( 𝑦 𝑖 | θ ) = 𝐿 ( θ | 𝑦 ) . It is usually simpler to work with the log of the likelihood function:
ln 𝐿 ( θ | y ) = ∑ i = 1 n ln 𝑓 ( 𝑦 𝑖 | θ ) . MLE estimation: choose θ to maximize log likelihood function.
The necessary condition is
∂ ln 𝐿 ( θ | 𝑦 , 𝑋 ) ∂ θ = 0 , which is called the likelihood equation.
Properties of MLE Under regularity, the maximum likelihood estimator (MLE) has the following asymptotic properties:
Consistency: plim θ ^ = θ 0 . Asymptotic normality: θ ^ ∼ a N [ θ 0 , { 𝐼 ( θ 0 ) } − 1 ] , where𝐼 ( θ 0 ) = − 𝐸 0 [ ∂ 2 ln 𝐿 ∂ θ 0 ∂ θ 0 ′ ] . Asymptotic efficiency: θ ^ is asymptotically efficient and achieves the Cramer Rao lower bound for consistent estimators. For each observations, we have log-density ln 𝑓 ( 𝑦 𝑖 | θ ) .Denote 𝑔 𝑖 = ∂ ln 𝑓 ( 𝑦 𝑖 | θ ) ∂ θ and 𝐻 𝑖 = ∂ 2 ln 𝑓 ( 𝑦 𝑖 | θ ) ∂ θ ∂ θ ′ , i = 1 , … , n . Then we have
𝑔 = ∂ ln 𝐿 ( θ | 𝑦 ) ∂ θ = ∑ i = 1 𝑛 ∂ ln 𝑓 ( 𝑦 𝑖 | θ ) ∂ θ = ∑ i = 1 𝑛 𝑔 𝑖 , and
Hessian Matrix 𝐻 = ∂ 2 ln 𝐿 ( θ | 𝑦 ) ∂ θ ∂ θ ′ = ∑ 𝑖 = 1 𝑛 ∂ 2 ln 𝑓 ( 𝑦 𝑖 | θ ) ∂ θ ∂ θ ′ = ∑ 𝑖 = 1 𝑛 𝐻 𝑖 . Information matrix equality:
Var [ ∂ ln 𝐿 ( θ 0 | 𝑦 ) ∂ θ 0 ] = 𝐸 0 [ ( ∂ ln 𝐿 ( θ 0 | 𝑦 ) ∂ θ 0 ) ( ∂ ln 𝐿 ( θ 0 | 𝑦 ) ∂ θ 0 ′ ) ] = − 𝐸 0 [ ∂ 2 ln 𝐿 ( θ 0 | 𝑦 ) ∂ θ 0 ∂ θ 0 ′ ] Consistency Let θ 0 be the true value of the parameter, θ ^ is the MLE, and θ be any other estimator for Θ . Then MLE θ ^ is consistent, i.e.,
plim θ ^ = θ 0 . Asymptotic Normality The MLE θ ^ has an asymptotic normal distribution,
n ( θ ^ − θ 0 ) → d 𝑁 [ 0 , { − 𝐸 0 [ 1 𝑛 𝐻 ( θ 0 ) ] − 1 } ] , so we have
θ ^ ∼ a 𝑁 [ θ 0 , { 𝐼 ( θ 0 ) } − 1 ] . Asymptotic Efficiency Cramer-Rao Lower Bound
Assuming that the density of 𝑦 𝑖 satisfies the regularity conditions, the asymptotic variance of a consistent and asymptotically normally distributed estimator of the parameter vector θ 0 will always be at least as large as
[ 𝐼 ( θ 0 ) ] − 1 = ( − 𝐸 0 [ ∂ 2 ln 𝐿 ( θ 0 ) ∂ θ 0 ∂ θ 0 ′ ] ) − 1 . Hypothesis Test Likelihood Ratio Test:
− 2 ln L ^ R L ^ U ∼ χ 2 ( J ) . Wald Test:
W = [ 𝑐 ( θ ^ ) − 𝑞 ] ′ ( Asy . Var [ c ( θ ^ ) − 𝑞 ] − 1 [ 𝑐 ( θ ^ ) − 𝑞 ] ∼ χ 2 ( 𝐽 ) . Lagrange Multiplier Test:
LM = ( ∂ ln L ( θ ^ 𝑅 ) ∂ θ ^ 𝑅 ) ′ [ 𝐼 ( θ ^ 𝑅 ) ] − 1 ( ∂ ln 𝐿 ( θ ^ 𝑅 ) ∂ θ ^ 𝑅 ) ′ ∼ χ 2 ( J ) , where J is the number of restrictions.