--- title: "ETC1010 Lab 7" author: "David T. Frazier" date: "09/09/2017" output: html_document: default pdf_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(ggplot2) library(quantreg) ``` # OLS and Least Absolute Deviation (LAD) Regression In this lab we will build and estimate linear models using the method of least absolute deviations (LAD). We will then comparing these results with those of standard OLS. This will require writing your own functions in R, pulling in the required data, and passing these bits to the `optim()` function in `R`. Given a sample $(y_i,x_i),\; i=1,...,n$, from the population linear model $$Y=\beta_0 +\beta_1 X_1 +\beta_2 X_2 +\cdots +\beta_p X_p +\epsilon,$$ the LAD regression estimates are obtained by solving the minimization problem:$$\min_{b_0,...,b_p}\sum_{i=1}^{n}|y_i-b_0-b_1 x_{1i}-\cdots-b_p x_{pi}|.$$ ## Lab Exercise In this week's lab we will analyze the relationship between income and food expenditures for a sample of Belgian households. a. Load the dataset "engel.csv". Visualize the relationship bewteen income (x-coordinate) and foodexp (y-coordinate)? b. Using `optim()`, fit the linear regression model $$foodexp_i = \beta_0+\beta_1 income_i+\beta_2 income_i^2+\epsilon_i$$ using the objective function $$f(b)={\sum_{i=1}^{n}(foodexp_{i}-b_0-b_1income_i-b_2income_i^2)^2}.$$ Take as your starting values for the optimization `c(0,0,0)`. c. Now, fit the model using the `lm()` function. Compare these results with those obtained from your optimization program. d. Change the starting values in `optim()` to be `c(8,0,0)` and re-run the program. Compare the results with those obtained from the `lm()` function. e. Questions c. and d. demonstrate that optimization is sometimes sensative to the choice of *initial values*. This is because `optim()` is what we call a local optimization method, in that the program seeks to optimize the objective function in a neighborhood of the starting value. In general, how could you attempt to solve this problem in practice? f. What function shape would `optim()` would work well with? Can you list some examples. g. Plot the residuals from `lm()` against the fitted values and comment on this relationship. Do there appear to be any visual outliers? HINT: use `?lm()` to see all of `lm()` options. h. OLS is sensative to outliers since the objective function $f(b)$ behaves like a polynomial. A potential means of alleviating this issue is to instead estimate the regression model $$foodexp_i = \beta_0+\beta_1 income_i+\beta_2 income_i^2+\epsilon_i$$ using least absolute deviations (LAD). Why might LAD be better behaved in the presence of outliers than OLS? i. Using `optim()`, fit the linear regression model $foodexp_i = \beta_0+\beta_1 income_i+\beta_2 income_i^2+\epsilon_i$ using LAD. Compare the estimated coefficients with those obtained via OLS. j. Should we be concered with the starting value issue in `optim()` in the case of LAD regression? k. Implement one of the methods you discussed in part e. to determine if the estiated regression coefficients change for different starting values.