statistics r project 1

I. Online you will nd the le salary1.csv.

Column 1: sl: The three month salary of the subject in dollars (Y ).

Column 2: yd: The number of years since the subject earned their highest degree (X1) (i.e., years of experi-

ence).

Column 3: dg: The highest degree earned (doctorate, masters) of the subject (X2).

(a) Write down the estimated linear regression model.

(b) Interpret b1 and b2 in terms of the problem.

(c) Predict the 3-month salary of a subject who has 10 years of experience and has earned their doctorate.

(d) Find the 95% con dence interval for only the value 1, and interpret it in terms of the problem.

(e) Find the simultaneous/family-wise/overall 95% con dence intervals for 1, 2, using Bonferroni’s multi-

plier.

(f) Create two prediction intervals for the 3-month salary at the following values of X1, X2:

(5, masters), (10, doctorate)

with overall/simultaneous/family-wise level 90%. Use Sche e’s multiplier.

II. Part II: Use the le salary2.csv.

Column 1: sl: The three month salary of the subject in dollars (Y ).

Column 2: yd: The number of years since the subject earned their highest degree (X1) (i.e., years of experi-

ence).

Column 3: dg: The highest degree earned (doctorate, masters) of the subject (X2).

Column 4: sx: The gender of the subject (male, female) (X3)

Column 5: rk: The rank of the subject (assistant, associate, full) (X4)

(a) Test to see if X4 can be dropped from the model, comparing to the full model with X1;X2;X3;X4.

Specify the null and alternative in terms of ‘s, the value of FS, the corresponding p-value. State your

conclusion for = 0:01.

(b) Test to see if both X2 and X3 can be dropped from the model, comparing to the full model with

X1;X2;X3;X4. Specify the null and alternative in terms of ‘s, the value of FS, the corresponding

p-value. State your conclusion for = 0:01.

(c) Based on your observations from (b) and (d), t the best” model and write down its estimated linear

equation.

(d) What is the additional reduction in error we expect to see when we add X4 to a model with only X1 in

it already? Interpret this value.

(e) What is the additional reduction in error we expect to see when we add X2;X3 to a model with X1;X4

in it already? Interpret this value.

(f) Do the above values agree with your best model” from part (c)? Explain.