# Linear Regression Model

Population regression equation

  • Deterministic part and random part
  • Linear relationship

Sample regression equation

  • y𝑖: Observed value of the interested variable;
  • x𝑖1β1+x𝑖2β2++x𝑖𝐾: a deterministic part;
  • ε𝑖: A random part;
  • matrix form : y=Xβ+ε

Target. Estimate unknown parameters, make predictions.

# Assumptions

Assumption 1 (Full Rank). X is an n×K matrix with rank K, i.e., there are no exact linear relationships among the variables.

Assumption 2 (Mean independence). The expected value of disturbance is zero conditional on the observation, i.e.,


𝐶𝑜𝑣(𝑋,ε)=𝐶𝑜𝑣𝑋(𝑋,𝐸(ε|𝑋)) ∴ Mean independence implies that 𝐶𝑜𝑣(ε,𝑋)=0.

Assumption 3 (Homoscedasticity). The variances and covariances of the disturbances

Var[εi|X]=σ2, for all i=1,,nCov[εi,εj|X]=0, for all ij

which is summarized as E[εε|X]=σ2I.

# Least square regression

Population regression:


Sample regression:


The least squares coefficient vector minimizes the sum of squared residuals:


and the solution is



  1. The least squares residuals sum to zero.
  2. y¯=x¯b.
  3. y¯=y^¯.

It is important to note that none of these results need hold if the regression does not contain a constant term.

Least squares partitions the vector y into two orthogonal parts,


# Partitioned regression

Suppose that the regression involves two sets of variables, 𝑋1 and 𝑋2. Thus,


The normal equations are


Theorem (Orthogonal Partitioned Regression). In the multiple linear least squares regression of y on two sets of variables 𝑋1 and 𝑋2, if the two sets of variables are orthogonal, then the separate coefficient vectors can be obtained by separate regressions of y on 𝑋1 alone and y on 𝑋2 alone.

Theorem (Frisch-Waugh-Lovell Theorem). In the linear least squares regression of vector y on two sets of variables, 𝑋1 and 𝑋2, the subvector b2 is the set of coefficients obtained when the residuals from a regression of y on 𝑋1 alone are regressed on the set of residuals obtained when each column of 𝑋2 is regressed on 𝑋1.

Theorem (Change in the Sum of Squares When a Variable is Added to a Regression). If 𝑒𝑒 is the sum of squared residuals when y is regressed on X and 𝑢𝑢 is the sum of squared residuals when y is regressed on X and z, then


where c is the coefficient on z in the long regression of y on [X,z] and 𝑧=𝑀𝑧 is the vector of residuals when z is regressed on X.

# Goodness of fit

We want to know how the variation of y is explained by the variation of x:


We can obtain a measure of how well the regression line fits the data by using the


Theorem (Change in R𝟐 When a Variable is Added to a Regression). Let 𝑅𝑋𝑧2 be the coefficient of determination in the regression of y on X and an additional variable z, let 𝑅𝑋2 be the same for the regression of y on X alone, and let ry𝑧 be the partial correlation between y and z, controlling for X. Then

  • where the partial correlation ryz is the simple correlation between y and 𝑧 , where the square of the partial correlation coefficient isryz2=(zy)2(zz)(yy).

The adjusted 𝑅2 (for degrees of freedom), which in corporates a penalty for these results is computed as follows


The connection between R2 and R¯2 is
