# Linear Regression Model

Population regression equation

y=f(x1,x2,,xK)+ε=x1β1+x2β2++xKβK+ε
  • Deterministic part and random part
  • Linear relationship

Sample regression equation

y𝑖=x𝑖1β1+x𝑖2β2++x𝑖𝐾β𝐾+ε𝑖
  • y𝑖: Observed value of the interested variable;
  • x𝑖1β1+x𝑖2β2++x𝑖𝐾: a deterministic part;
  • ε𝑖: A random part;
  • matrix form : y=Xβ+ε

Target. Estimate unknown parameters, make predictions.

# Assumptions

Assumption 1 (Full Rank). X is an n×K matrix with rank K, i.e., there are no exact linear relationships among the variables.

Assumption 2 (Mean independence). The expected value of disturbance is zero conditional on the observation, i.e.,

E[ε|X]=[E[ε1|X]E[ε2|X]E[εn|X]]=0.

𝐶𝑜𝑣(𝑋,ε)=𝐶𝑜𝑣𝑋(𝑋,𝐸(ε|𝑋)) ∴ Mean independence implies that 𝐶𝑜𝑣(ε,𝑋)=0.

Assumption 3 (Homoscedasticity). The variances and covariances of the disturbances

Var[εi|X]=σ2, for all i=1,,nCov[εi,εj|X]=0, for all ij

which is summarized as E[εε|X]=σ2I.

# Least square regression

Population regression:

yi=xiβ+εi

Sample regression:

yi=xib+ei

The least squares coefficient vector minimizes the sum of squared residuals:

i=1nei2=i=1n(yixib)2

and the solution is

b=(XX)1Xy

Proposition

  1. The least squares residuals sum to zero.
  2. y¯=x¯b.
  3. y¯=y^¯.

It is important to note that none of these results need hold if the regression does not contain a constant term.

Least squares partitions the vector y into two orthogonal parts,

y=𝑃y+𝑀y=projection+residualy=X(XX)1Xy+[IX(XX)1X]y

# Partitioned regression

Suppose that the regression involves two sets of variables, 𝑋1 and 𝑋2. Thus,

y=Xβ+ε=X1β1+X2β2+ε.

The normal equations are

[X1X1X1X2X2X1X2X2][b1b2]=[X1yX2y]

Theorem (Orthogonal Partitioned Regression). In the multiple linear least squares regression of y on two sets of variables 𝑋1 and 𝑋2, if the two sets of variables are orthogonal, then the separate coefficient vectors can be obtained by separate regressions of y on 𝑋1 alone and y on 𝑋2 alone.

Theorem (Frisch-Waugh-Lovell Theorem). In the linear least squares regression of vector y on two sets of variables, 𝑋1 and 𝑋2, the subvector b2 is the set of coefficients obtained when the residuals from a regression of y on 𝑋1 alone are regressed on the set of residuals obtained when each column of 𝑋2 is regressed on 𝑋1.

Theorem (Change in the Sum of Squares When a Variable is Added to a Regression). If 𝑒𝑒 is the sum of squared residuals when y is regressed on X and 𝑢𝑢 is the sum of squared residuals when y is regressed on X and z, then

uu=eec2(zz)ee,

where c is the coefficient on z in the long regression of y on [X,z] and 𝑧=𝑀𝑧 is the vector of residuals when z is regressed on X.

# Goodness of fit

We want to know how the variation of y is explained by the variation of x:

yiy¯=y^y¯+ei=(xix¯)b+e

We can obtain a measure of how well the regression line fits the data by using the

coefficientofdetermination:SSRSST=bXM0XbyM0y=1eeyM0y.

Theorem (Change in R𝟐 When a Variable is Added to a Regression). Let 𝑅𝑋𝑧2 be the coefficient of determination in the regression of y on X and an additional variable z, let 𝑅𝑋2 be the same for the regression of y on X alone, and let ry𝑧 be the partial correlation between y and z, controlling for X. Then

𝑅𝑋𝑧2=𝑅𝑋2+(1𝑅𝑋2)𝑟y𝑧2,
  • where the partial correlation ryz is the simple correlation between y and 𝑧 , where the square of the partial correlation coefficient isryz2=(zy)2(zz)(yy).

The adjusted 𝑅2 (for degrees of freedom), which in corporates a penalty for these results is computed as follows

R¯2=1ee/(nK)yM0y/(n1).

The connection between R2 and R¯2 is

R¯2=1n1nK(1R2).