Variables in a regression can be endogenous for several reasons including omitted variable biased, measurement error and simultaneity / reverse causation. One example from the previous post was that of unobserved ability in the determination of wages. Since unobserved ability is omitted from a regression of the impact of wages on income it is possible that the return to education is overestimated.

The Hausman Test for endogeneity can help us determine whether or not there is some for of omitted variable biased in this regression:

Since there is a suspicion that education (educ) suffers from omitted variable biased in the form of unobserved ability, we choose fathers and mothers education as instrumental variables. Parents education is likely not to affect the wages of their children but your parents education are good predictor of your education and genetic transmission of intellectual ability. This is why they may potentially be a good instrumental variable. We can test this assumption that father and mothers education are strong instruments by running a reduced form regression, with educ as the independent variable and all exogenous variables including the instruments and the explanatory variables.

The F-test above shows that in fact fathers’ and mother’s education are both statistically significant in determining their offspring’s educational attainment. The next step is to take the residuals of the reduced form equation and those residuals back into the structural equation. The structural equation is the original relationship that we care about Testing the statistical significance of the coefficient on the residuals in the structural equation is the Hausman Test.

The null-hypothesis is that ‘resid’ is zero and that therefore education is exogenous. This hypothesis can be rejected at the 10% level, but not 5% level. This is a border-line case, but for the sake of completeness we will use the 10% significance level to reject the null-hypothesis that ‘resid’ is zero and thus that education is exogenous. In other words, there is evidence that education is endogenous.

Given that we have selected what we believe to be a good instrument: 1) Parent’s education are related to offspring education and 2) parent’s education is unlikely to be related to their offspring’s wages. The next step is to estimate the model using parent’s education as instruments for people in the sample who are earning wages, since we rejected the null-hypothesis that ‘resid’ was zero at the 10% level in the previous regression.

Concluding Remarks: The Hausman Test is used to determine whether or not one of the explanatory variables in a regression suffers from endogeneity (omitted variable biased, measurement error, or reverse causality). The Hausman test found such endogeneity in the form of ommitted variable bias.

The correct regression to run is the instrumental variable regression if you reject the null-hypothesis at the 10% level like we did. Running the IV regression one finds that each year of education increases wages by 6%.

If one believes that the 10% level is too generous, then decide on using the 5% significance level, we would not reject the null-hypothesis that ‘resid’ is zero thus we would not reject the hypothesis that education is exogenous. This would lead us to use the original OLS estimate of an 11% yearly return to education.