Applied Econometrics Exam 1 (100 points) 1. [70 points] You wish to predict the sale price of single-family residences in Massachusetts using

property features (commonly called a “hedonic pricing model”). You collect price and property features

data on properties sold in the state for the year 2010 and obtain the following regression:

Pricei = 14407.60 – 759.92*houseagei + .24*lotsizei + 354.35*bldareai + 12015.61*roomsi + µi

(6433.23) (89.67)

(.115)

(265.39)

(8516.47)

Observations = 2691 R2 = 0.49 F = 65.10 Where:

houseage = age of the house (in years)

lotsize = total square feet of the land

bldarea = total square feet of the interior of the house

rooms = total number of rooms in the house

A. [6 points] How would you categorize, or label, this dataset? Defend your answer.

B. [6 points] What is the interpretation of the constant term in this regression? Why is it included?

C. [8 points] How do we interpret the coefficient on lotsize? Why is the coefficient on lotsize

nominally small if we expect it to have a large impact on the price of a house?

D. [6 points] What is the predicted price of a house that is 7 years old, with a lot size of 800 square

feet, a building interior of 400 square feet, and 5 rooms? Will this predicted price be close to the

actual price? Why or why not?

E. [6 points] Explain what is meant that the value of the R 2 = .49. What is one good reason and one

bad reason to use R2 as a measure of the “goodness of fit” of a regression?

F. [10 points] Test the significance of each independent variable in the model using α = .05. Are

these findings expected? Why or why not, and what could explain your findings? G. [8 points] Construct a 95% confidence interval for houseage in the model above. What this

measure is telling us? How will consistency in your OLS estimation affect your confidence

intervals?

H. [10 points] After thinking about your model further, you wish to add median income as variable

in your regression. You collect data on the median income of each census tract in Massachusetts

in the year 2010, and match that to your housing data. Assuming you believe that your original

form of the model suffered from omitted variable bias, in what direction would you expect your

estimates to change with the inclusion of median income? Defend your answers. I. [10 points] Suppose you ran the same model as above only using log(price) instead of price and

obtain an R2 of 0.54 and an F-statistic of 68.17. Based on this information, are we able to say

which version (level or log) of the model is better? Explain why or why not. 2. [10 points] The following questions are based on airline data collected from routes in the U.S. for the

year 2000. You are interested in examining the determinants of ticket prices. 0 avg. one-way fare, $

200

400 600 You decide to generate a scatter diagram to visually assess if the average one-way fare for a route is

related to the distance of that route. 0 1000 2000 3000 distance, in miles Based on this scatter diagram, would OLS be BLUE if we ran a regression of the average one-way fare for

a route on the distance of that route? Explain why or why not and if this would affect any estimated

coefficients. 3. [20 points] Suppose that 5 years ago, a new job training program was introduced in Massachusetts

that accepted 1000 applicants. For $200, participants would undergo a 6-week training program that

would teach them about basic computer skills, resume building, interviewing, and job searching. Now, 5

years later, you wish to survey some of these participants to see if the training has helped them.

You create a simple survey asking participants how much they though the job training helped them in

their professional life on a scale of 1 to 5 (with 5 being most helpful and 1 being least helpful), how much

they currently earn per month, and how many jobs they are currently working.

You mail out 100 surveys to known participants, and after waiting several weeks you receive 32

responses. You wish to use the income and number of job responses to predict how favorably a participant viewed the training (using the 1 through 5 scale above). If you estimate your model using

OLS, do you believe that any of our Classical Linear Model assumptions would be violated? If so, explain

which ones, why they are violated, and what potential problems that could pose for your estimation.

Click here to order this paper @Superbwriters.com. The Ultimate Custom Paper Writing Service