# The Least Squares Estimator

# Finite Sample Properties

# Unbiased Estimation

𝐸 [b] = 𝐸_{x} {𝐸 [b | X]} = 𝐸_{x} [β] = β .

The interpretation of this result is that for any particular set of observations $X$ , the least squares estimator has expectation $β$ . Therefore, when we average this over the possible values of $X$ , we find the unconditional mean is $β$ as well.

# Omission of Relevant Variables

The estimation is biased when a relevant variable is omitted in the regression:

𝐸 [b_{1} | X] = β_{1} + 𝑃_{1, 2} β_{2}

where

𝑃_{1, 2} = (𝑋_{1}^{'} 𝑋_{1})^{- 1} 𝑋_{1}^{'} 𝑋_{2} .

# Inclusion of Irrelevant Variables

Inclusion of irrelevant variables will not affect the biasness of the relevant variables. However, it has the cost of have a higher covariance matrix, which decrease the efficiency of the estimation.

E [(\begin{matrix} b \\ c \end{matrix}) | X, z] = (\begin{matrix} β \\ γ \end{matrix}) == (\begin{matrix} β \\ 0 \end{matrix}) .

# Variance of the Least Squares Estimator

Since $𝐸 [b] = β$ and $𝐸 [ε ε^{'} | X] = σ^{2} 𝐼$ ,we have

\begin{aligned} Var [b ∣ X] & = E [(b - β) (b - β)^{'} ∣ X] \\ = E [{(X^{'} X)}^{- 1} X^{'} ε ε^{'} X {(X^{'} X)}^{- 1} ∣ X] \\ = {(X^{'} X)}^{- 1} X^{'} E [ε ε^{'} ∣ X] X {(X^{'} X)}^{- 1} \\ = {(X^{'} X)}^{- 1} X^{'} (σ^{2} I) X {(X^{'} X)}^{- 1} \\ = σ^{2} {(X^{'} X)}^{- 1} \end{aligned}

Theorem (Gauss Markov Theorem). In the linear regression model with regressor matrix $X$ , the least squares estimator $b$ is the minimum variance linear unbiased estimator of $β$ . For any vector of constants $w$ , the minimum variance linear unbiased estimator of $w^{'} β$ in the regression model is $w^{'} b$ , where $b$ is the least squares estimator.

# Estimating the Variance of the Least Squares Estimator

We don’t use

{\hat{σ}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} e_{i}^{2},

but use

s^{2} = \frac{e^{'} e}{n - K} .

E s t . V a r [b | X] = \frac{e^{'} e}{n - K} (X^{'} X)^{- 1} .

If we assume the disturbances are normally distributed, then the estimator $b$ has normal distribution as well,

b | X \sim N [β, σ^{2} (X^{'} X)^{- 1}] .

# Large Sample Properties

# Consistency of the Estimator

Assume

{plim}_{n \to \infty} \frac{X^{'} X}{n} = Q,

where $Q$ is a positive definite matrix, we have

plim b = β .

# Consistency and Unbiasedness

Consider a sample $x_{1}, \dots, x_{n}$ from a $N (μ, σ^{2})$ population, and we want to estimate $μ$ .

Unbiased but not consistent. $x_{1}$ is an unbiased estimator of $μ$ since $E [x_{1}] = µ$ . But, $x_{1}$ is not consistent since its distribution does not become more concentrated around $µ$ as the sample size increases, it’s always $N (μ, σ^{2})$
Consistent but not unbiased. $\tilde{x} = \frac{1}{n - 1} \sum_{i = 1}^{n} x_{i}$ . Since $E [\tilde{x}] = \frac{n}{n - 1} μ \neq μ$ , $\tilde{x}$ is an biased estimator. When $n \to \infty$ , $E [\tilde{x}] \to μ$ , so $\tilde{x}$ is a consistent estimator.

# Asymptotic Normality of the Estimator

Theorem (Asymptotic Distribution of $b$ with Independent Observations). If ${ε_{i}}$ are independently distributed with mean zero and finite variance $σ^{2}$ and $x_{i k}$ is such that the Grenander conditions are met, then

b \overset{a}{\sim} N [β, \frac{σ^{2}}{n} Q^{- 1}] .

Grenander Conditions

For each column of $X$ , $x_{k}$ , if $𝑑_{n k}^{2} = x_{k}^{'} x_{k}$ , then $lim_{n \to \infty} d_{n k}^{2} = + \infty$ .
$lim_{n \to \infty} \frac{x_{i k}^{2}}{d_{n k}^{2}} = 0$ for all $i = 1, 2, \dots, n$ .
Let $𝑅_{n}$ be the sample correlation matrix of the columns of $X$ , excluding the constant term if there is one. Then $lim_{n \to \infty} R_{n} = 𝐶$ , a positive definite matrix.

# Interval Estimation

The ratio

t_{K} = \frac{b_{k} - β_{k}}{\sqrt{s^{2} S^{k k}}},

where $S^{k k}$ denotes the $k$ th element of $(X^{'} X)^{- 1}$ , has a $t$ distribution with $n - K$ degrees of freedom, and a confidence interval for $β_{k}$ can be formed using

P r o b [b_{k} - t_{(1 - α / 2), [n - K]} \sqrt{s^{2} S^{k k}} \leq β_{k} \leq b_{k} + t_{(1 - α / 2), [n - K]} \sqrt{s^{2} S^{k k}}] = 1 - α .

# Prediction

The prediction variance is

𝑉 𝑎 𝑟 [e^{0} | X, x_{0}] = σ^{2} + {x^{0}}^{'} [σ^{2} (X^{'} X)^{- 1}] x^{0},

and the prediction interval is

{\hat{y}}^{0} \pm t_{(1 - α / 2), [n - K]} s e (e^{0}) .

← Linear Regression Model Hypothesis Tests and Model Selection →