Instrumental Variables in Simple Regression
- Section 15.1 of Heiss (2020)
- Consider the simple linear regression:
The OLS estimator is:
$$ \beta^{OLS}_1 = \frac{cov(x, y)}{var(x)} $$If the regressor $x$ is correlated with the error term $\varepsilon$, then the OLS estimator is biased.
Given a valid instrumental variable $z$, the IV estimator becomes:
$$ \beta^{VI}_1 = \frac{cov(z, y)}{cov(z,x)} $$Implementing IV in R
Example 15.1: Returns to Education for Married Women (Wooldridge, 2019)
- We will use the
mrozdataset from thewooldridgepackage to estimate the following model:
- For comparison, we first estimate the model by OLS:
data(mroz, package="wooldridge") # loading the dataset
mroz = mroz[!is.na(mroz$wage),] # dropping missing wage observations
reg.ols = lm(lwage ~ educ, mroz) # OLS regression
round( summary(reg.ols)$coef, 5 )
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.18520 0.18523 -0.99984 0.31795
## educ 0.10865 0.01440 7.54513 0.00000
Using ivreg()
- To estimate an instrumental-variables regression, we can use the
ivreg()function from theAERpackage. - After specifying the endogenous regressor
educ, we add the instrument on the right-hand side of|. In this example, the father’s education (fatheduc) is used as the instrument:
library(AER) # loading the package that includes ivreg
## Carregando pacotes exigidos: car
## Warning: package 'car' was built under R version 4.2.3
## Carregando pacotes exigidos: carData
## Carregando pacotes exigidos: lmtest
## Carregando pacotes exigidos: zoo
## Warning: package 'zoo' was built under R version 4.2.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Carregando pacotes exigidos: sandwich
## Carregando pacotes exigidos: survival
## Warning: package 'survival' was built under R version 4.2.3
reg.iv = ivreg(lwage ~ educ | fatheduc, data=mroz) # IV regression
round( summary(reg.iv)$coef, 5 )
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.44110 0.44610 0.98880 0.32332
## educ 0.05917 0.03514 1.68385 0.09294
## attr(,"df")
## [1] 426
## attr(,"nobs")
## [1] 428
By-hand derivation
(1) IV estimate $$ \beta^{VI} $$
- The simple-regression formula above is useful for intuition.
- In practice, the multivariate IV setup is more relevant in econometrics, so the full matrix derivation is developed in the next section.
Instrumental Variables in Multiple Regression
- Section 15.2 of Heiss (2020)
- The next section extends the IV estimator to the multivariate case, where we combine endogenous and exogenous regressors in matrix form.
Testing Regressor Exogeneity
- Section 15.4 of Heiss (2020)
- Once we allow for endogeneity, an important empirical question is whether OLS and IV differ enough to justify treating a regressor as endogenous.
Testing Overidentifying Restrictions
- Section 15.5 of Heiss (2020)
- When the model has more instruments than endogenous regressors, we can test whether the extra instruments are jointly consistent with the exogeneity assumptions.
Two-Stage Least Squares
- Section 15.3 of Heiss (2020)
- Two-stage least squares (2SLS) is the standard operational form of IV estimation in multivariate models and will be the main focus of the following section.
Simultaneous Equations Models
- Section 15.3 of Heiss (2020)
- Simultaneous-equations settings provide a classic motivation for IV methods, since endogenous variables are determined jointly within the system.