# Linear Regression Model

Population regression equation

\begin{aligned} y & = f (x_{1}, x_{2}, \dots, x_{K}) + ε \\ = x_{1} β_{1} + x_{2} β_{2} + \dots + x_{K} β_{K} + ε \end{aligned}

Deterministic part and random part
Linear relationship

Sample regression equation

y_{𝑖} = x_{𝑖 1} β_{1} + x_{𝑖 2} β_{2} + \dots + x_{𝑖 𝐾} β_{𝐾} + ε_{𝑖}

$y_{𝑖}$ : Observed value of the interested variable;
$x_{𝑖 1} β_{1} + x_{𝑖 2} β_{2} + \dots + x_{𝑖 𝐾}$ : a deterministic part;
$ε_{𝑖}$ : A random part;
matrix form : $y = X β + ε$

Target. Estimate unknown parameters, make predictions.

# Assumptions

Assumption 1 (Full Rank). $X$ is an $n \times K$ matrix with rank $K$ , i.e., there are no exact linear relationships among the variables.

Assumption 2 (Mean independence). The expected value of disturbance is zero conditional on the observation, i.e.,

E [ε | X] = [\begin{matrix} E [ε_{1} | X] \\ E [ε_{2} | X] \\ ⋮ \\ E [ε_{n} | X] \end{matrix}] = 0.

∵ $𝐶 𝑜 𝑣 (𝑋, ε) = 𝐶 𝑜 𝑣_{𝑋} (𝑋, 𝐸 (ε | 𝑋))$ ∴ Mean independence implies that $𝐶 𝑜 𝑣 (ε, 𝑋) = 0$ .

Assumption 3 (Homoscedasticity). The variances and covariances of the disturbances

V a r [ε_{i} | X] = σ^{2}, for all i = 1, \dots, n

C o v [ε_{i}, ε_{j} | X] = 0, for all i \neq j

which is summarized as $E [ε ε^{'} | X] = σ^{2} I$ .

# Least square regression

Population regression:

y_{i} = x_{i} β + ε_{i}

Sample regression:

y_{i} = x_{i} b + e_{i}

The least squares coefficient vector minimizes the sum of squared residuals:

\sum_{i = 1}^{n} e_{i}^{2} = \sum_{i = 1}^{n} (y_{i} - x_{i}^{'} b)^{2}

and the solution is

b = (X^{'} X)^{- 1} X^{'} y

Proposition

The least squares residuals sum to zero.
$\bar{y} = {\bar{x}}^{'} b$ .
$\bar{y} = \bar{\hat{y}}$ .

It is important to note that none of these results need hold if the regression does not contain a constant term.

Least squares partitions the vector $y$ into two orthogonal parts,

y = 𝑃 y + 𝑀 y = projection + residual

y = X (X^{'} X)^{- 1} X^{'} y + [I - X (X^{'} X)^{- 1} X^{'}] y

# Partitioned regression

Suppose that the regression involves two sets of variables, $𝑋_{1}$ and $𝑋_{2}$ . Thus,

y = X β + ε = X_{1} β_{1} + X_{2} β_{2} + ε .

The normal equations are

[\begin{matrix} X_{1}^{'} X_{1} & X_{1}^{'} X_{2} \\ X_{2}^{'} X_{1} & X_{2}^{'} X_{2} \end{matrix}] [\begin{matrix} b_{1} \\ b_{2} \end{matrix}] = [\begin{matrix} X_{1}^{'} y \\ X_{2}^{'} y \end{matrix}]

Theorem (Orthogonal Partitioned Regression). In the multiple linear least squares regression of $y$ on two sets of variables $𝑋_{1}$ and $𝑋_{2}$ , if the two sets of variables are orthogonal, then the separate coefficient vectors can be obtained by separate regressions of $y$ on $𝑋_{1}$ alone and $y$ on $𝑋_{2}$ alone.

Theorem (Frisch-Waugh-Lovell Theorem). In the linear least squares regression of vector $y$ on two sets of variables, $𝑋_{1}$ and $𝑋_{2}$ , the subvector $b_{2}$ is the set of coefficients obtained when the residuals from a regression of $y$ on $𝑋_{1}$ alone are regressed on the set of residuals obtained when each column of $𝑋_{2}$ is regressed on $𝑋_{1}$ .

Theorem (Change in the Sum of Squares When a Variable is Added to a Regression). If $𝑒^{'} 𝑒$ is the sum of squared residuals when $y$ is regressed on $X$ and $𝑢^{'} 𝑢$ is the sum of squared residuals when $y$ is regressed on $X$ and $z$ , then

u^{'} u = e^{'} e - c^{2} (z_{*}^{'} z_{*}) \leq e^{'} e,

where $c$ is the coefficient on $z$ in the long regression of $y$ on $[X, z]$ and $𝑧_{*} = 𝑀 𝑧$ is the vector of residuals when $z$ is regressed on $X$ .

# Goodness of fit

We want to know how the variation of $y$ is explained by the variation of $x$ :

y_{i} - \bar{y} = \hat{y} - \bar{y} + e_{i} = (x_{i} - \bar{x})^{'} b + e

We can obtain a measure of how well the regression line fits the data by using the

coefficient of determination : \frac{S S R}{S S T} = \frac{b^{'} X^{'} M^{0} X b}{y^{'} M^{0} y} = 1 - \frac{e^{'} e}{y^{'} M^{0} y} .

Theorem (Change in $R^{𝟐}$ When a Variable is Added to a Regression). Let $𝑅_{𝑋 𝑧}^{2}$ be the coefficient of determination in the regression of $y$ on $X$ and an additional variable $z$ , let $𝑅_{𝑋}^{2}$ be the same for the regression of $y$ on $X$ alone, and let $r_{y 𝑧}^{*}$ be the partial correlation between $y$ and $z$ , controlling for $X$ . Then

𝑅_{𝑋 𝑧}^{2} = 𝑅_{𝑋}^{2} + (1 - 𝑅_{𝑋}^{2}) 𝑟_{y 𝑧}^{* 2},

where the partial correlation $r_{y z}^{*}$ is the simple correlation between $y_{*}$ and $𝑧_{*}$ , where the square of the partial correlation coefficient is $r_{y z}^{* 2} = \frac{(z_{*}^{'} y_{*})^{2}}{(z_{*}^{'} z_{*}) (y_{*}^{'} y_{*})} .$

The adjusted $𝑅^{2}$ (for degrees of freedom), which in corporates a penalty for these results is computed as follows

{\bar{R}}^{2} = 1 - \frac{e^{'} e / (n - K)}{y^{'} M^{0} y / (n - 1)} .

The connection between $R^{2}$ and ${\bar{R}}^{2}$ is

{\bar{R}}^{2} = 1 - \frac{n - 1}{n - K} (1 - R^{2}) .

← Large-Sample Distribution Theory The Least Squares Estimator →