# San Diego Stata Assignment 2

Stata Assignment 2 Econ 120B Spring 2020 · Xinwei Ma
Department of Economics, UCSD
• The deadline of submitting this Stata Assignment is May 31 (Sunday), 11:59pm. No late submission will be
accepted.
• This Stata Assignment will be graded on five scales: 0%, 25%, 50%, 75%, and 100%. If your do-file does not run, we
will subtract 25%.
/*******************************************************************************
ECON 120B, Spring 2020
Stata Assignment 2
Name:
PID:
*******************************************************************************/
clear all // clear the environment/memory
set more off
sysuse nlsw88 // load the built-in dataset nlsw88
• nlsw88 is a built-in dataset that comes with Stata. It is an extract from the 1988 round of the National Longitudinal
Survey of Mature and Young Women. Following is a summary of the variables in this dataset.
idcode survey id
age age
race race, can take three values, white, black or other
married = 1 if is currently marries, = 0 otherwise
never married = 1 if never married, = 0 otherwise
south = 1 if lives in southern states, = 0 otherwise
smsa = 1 if lives in standard metropolitan statistical area, = 0 otherwise
c city = 1 if lives in central city, = 0 otherwise
industry industry, use tab industry to see the categories
occupation occupation, use tab occupation to see the categories
union = 1 if is in a union, , = 0 otherwise
wage hourly wage, measured in \$
hours hours worked per week
ttl exp total work experience, measured in years
tenure current job tenure, measured in years
https://www.bls.gov/nls/orginal-cohorts/mature-and-young-women.htm
1
1. In this exercise you will run multiple regressions to study how education, experience, and job tenure affects women’s
wage. First, consider the following regression model:
lnwagei = β0 + β1gradei + β2ttl expi + β3tenurei + ui
,
To estimate this model, generate a new variable called lnwage which is the natural logarithm of wage times 100. Note
that one unit change in lnwage corresponds to 1% change in wage.
(a) Use the regress command to estimate the OLS coefficients. What is the percentage change in wage when education
increases by one year? How about job tenure?
(b) To test the null hypothesis that β2 = 3, what is the t-statistic? What is the p-value? Will you reject the null
hypothesis at the 10% significance level?
Next, to study if education has a quadratic effect on lnwage, consider the following regression model:
lnwagei = β0 + β1gradei + β2ttl expi + β3tenurei + β4grade2
i + ui
,
To estimate this model, generate a new variable which equals grade2
. Use the regress command to estimate the OLS
coefficients.
(c) What is the value of βˆ
1? What is the 95% confidence interval?
(d) What is the value of βˆ
4? Is it statistically significant at the 10% level?
(e) For someone with 12 years of education, what is the percentage change in wage if she receives an additional year
of education?
(f) To test the null hypothesis that β1 = β4 = 0, what is the Bonferroni statistic? How many restrictions are in this
hypothesis? What is the p-value? Will you reject the null hypothesis at the 5% level?
2. In this exercise, you will run multiple regressions with binary and categorical variables.
(a) Use the command tabulate to show the categories of the variable occupation and their frequencies. What is the
relative frequency of the category Sales? Please report a number between 0 and 1.
(b) Use the same command, this time specifying the option nolabel, to visualize the numeric values corresponding
to the different categories of occupation. Which numeric value corresponds to the label Sales?
(c) Use the command summarize with the option if to compute the sample mean of wage for workers with Sales
occupation. What is the average wage for workers with Sales occupation?
(d) Use the command regress wage i.occupation to run a regression with binary variables for every occupation
category. (Adding i. to a categorical variable will automatically generate a binary variable for each category.)
The occupation with numeric value 1 is used as the base group. Given the regression results, what is the average
wage for workers with Sales occupation? How does your answer compare to part 2(c)?
(e) Which occupation has the highest average wage? How much is it?
(f) Use a similar command as step 2(d), this time to study the average hours for each occupation. Which occupation
works the longest hours per week? How many hours on average?
Next, we follow a similar procedure as in steps 2(a)–(d) to study the wage gap among different races.
(g) Use the command regress wage i.race to run a regression with binary variables for every race category. What
is the average wage for white women?
(h) What is the wage gap between white and black? What is the 95% confidence interval for this wage gap?
(i) Generate three binary variables for categories in race to run a saturated regression instead of 2(g). What is the
average wage for white women? How does your result compare to 2(g)?
2
3. In this exercise, you will run multiple regressions with interaction terms. First, consider the following regression with
an interaction between the two binary regressors, collgrad and union:
,
To estimate this model, generate an interaction term between collgrad and union. Use the regress command to
estimate the OLS coefficients.
(a) What is the base category in this model? What is the average wage for workers in this base category?
(b) What is the difference in average wage for non-college graduates in a union and non-college graduates not in a
union? Please report a positive number.
(c) What is the difference in average wage for college graduates in a union and college graduates not in a union?