# Res Wknd – Group Project – Working with Data using RStudio

Res Wknd – Group Project – Working with Data using RStudioBackground: This course is all about data visualization. However, we must first have some understanding about the data that we are using to create the visualizations.  For this assignment, each group will be given its unique dataset to work with. That same dataset will be used for both part 1 and part 2 of this assignment.

Part 1 – Data Analysis with RStudio

Provide screen shots that show analysis of your dataset. For each screen shot, please show comment lines that describes what the next line(s) of code is to achieve, the code in proper syntax for R, and the computed results that R produces.

Watch the video included in this week’s Residency material to learn the simple commands to conduct basic data analysis with RStudio.

Use RStudio to generate results – create screen shots and then paste to a MS word document with the basic data analysis of your dataset. Remember to use a comment line (#) that explains each R instruction. Example: (#sets the working directory). Commands (setwd, dim, head, tail, structure, summary, cor, transform, subset).

• First, set your working directory (command – setwd OR use drop down from RStudio Session tab.
• Load your dataset into RStudio and  examine its structure – read.csv OR select your object file from RStudioFiles pane. Other commands to use: dim, head, tail, structure, and summary (provide comment lines; the R code; and results as screen shot #1)
• View your original dataset – examine each field/grouping in the data – decide whether each field is: “categorical” or “continuous” data (add this also to screen shot #1)
• Create a correlation of stats for the dataset. R requires categorical fields to be 0/1 instead of no/yes; also, fields must be numeric instead of string – Hint: might be necessary to Transform some fields. If so, create a new version of your dataset with these transformations then do correlation on transformed data – commands: transform and cor (provide as screen shot #2)
• What is the Min, Max, Median, and Mean of a continuous value field in your data?  (provide also as screen shot #2)
• What is the correlation values between all fields in your dataset? (provide as screen shot #3)
• Create a subset of the dataset with only at least two field in your dataset – commands: subset, cor (provide also as screen shot #3)

These three (3) screen shots containing the required data details should be placed in a MS Word document and labeled as Part 1 – Dataset Analysis.

Part 2 – Data Visualizing with RStudio

Background: As we have learned, a lot of thought goes into the design of a visualization. In this examination of your data and its visualization, we review how data types influence the choice of graphing.

Provide screen shots that shows graphs and charts of your dataset (Do NOT use ggplot2 or other R package features) For each screen shot, please show comment lines that describes what the next line(s) of code is to achieve, the code in proper syntax for R, and the computed results that R produces.

Review Kirk chapter 4 and Res Wknd slide hand-outs to learn the data type requirements for each graph type.

Also use this  R Tutorial page: https://www.tutorialspoint.com/r/index.htm for reference on RStudio commands for creating graphs and charts.

Use RStudio to create graphs and charts – create screen shots and then paste to your MS word document showing visuals of your dataset. Use commands (pie, barplot, hist, boxplot, plot).

Graphs to Produce:

Pie Chart:

• Create a pie chart that shows relationships of certain fields/grouping of your dataset – see professor for details. Use command: pie (x) – (provide as screen shot #4)
• Label the fields/columns as appropriate – see professor for details, Use command pie (x), labels. (provide also as screen shot #4)
• Title the pie chart as (a name you choose). Use command pie (x), labels, main. (provide as screen shot #5)
• Color the pie chart using the rainbow option. Use command pie (x), labels, main, col. (provide also as screen shot #5.

Bar Plot:

• Create a bar plot that shows relationships of certain fields/grouping of your dataset – Use same previous fields/columns First create a matrix (H); assign values for each field/column to (H). Use command barplot (H). (provide as screen shot #4)
• Label the x and y axis as (see professor). Use command barplot (H), xlab, ylab. (provide also as screen shot #4)
• Label the x and y axis with names (see professor). Use command barplot (H), xlab =, ylab =.  (provide also as screen shot #4).
• Title the bar plot as (a name you choose).  Use command barplot (H), xlab =, ylab =, main.  (provide as screen shot #5).
• Color the bars in the bar plot any color you wish.  Use command barplot (H), xlab =, ylab =, main, col.  (provide also as screen shot #5).

Histogram:

• Create a histogram that shows frequency of values of chosen fields/columns of your dataset – use same previous r fields/columns. First, create a vector (v) that has values for values of each field/column. then use function hist (v). (provide as screen shot #6)
• Label the x and y axis as (same as previous bar plot). Use function hist (v, xlab =, xlim =, ylab =, ylim =.  (provide also as screen shot #6)
• Title the histogram as (same as previous bar plot).  Assign your title to a variable; title <- histogram name, Use function hist (v, main = “title”, xlab =, xlim =, ylab =, ylim =.  (provide as screen shot #7)
• Give the histogram any color you wish. Note: all bars should be the same color.  Use function hist (v, main = “title”, xlab =, col =, xlim =, ylab =, ylim =.  (provide also as screen shot #7)

Box Plot:

• Create a box plot that shows a measure of the distribution of values across chosen fields/columns of your dataset – use same previous fields/columns. First, create a vector (v) that has values for values of each field/column. then use function boxplot (v). (provide as screen shot #8)
• Label the x and y axis as (same as previous histogram). Use function boxplot (v, xlab =, ylab =,  (provide also as screen shot #8)
• Title the box plot as ( a name you choose).  Use function boxplot (v, main=, xlab =, ylab =,  (provide also as screen shot #8)
• Color the box plot any color you wish.  Use function boxplot (v, main=, xlab =, ylab =, col =.  (provide also as screen shot #8)

Scatter Plot:

• Create a scatter plot that shows many points of fields/columns of your dataset plotted in a Cartesian plain – use same previous fields/columns. First, create two variables for fields – for the horizontal coordinate (hw) and vertical (vw) for the vertical coordinate. then use function plot (vw,hw). (provide as screen shot #9).
• Create a scatter plot of just two of the fields/columns of  your dataset
• Choose one field/column of your dataset and plot that with a label the x coordinate
• Add a label to the y axis of this same field/column
• Add a Titleto  the scatter plot (as you choose)
• Color the scatter any color you wish (your choice).

These screen shots containing graphs and charts of your data should also be placed in the same MS Word document and labeled as Part 2 – Dataset Visualizing with RStudio.

You should have one MS Word document that shows both part 1 and part 2 as this assignment. Your deliverable includes both parts of this assignment; it also includes your cover page in APA style showing: Title of this project; Group color and list of members, University’s name, Course name, Course number, Professor’s name, and Date. Although this is work done in your Group, each learner must post an individual copy to iLearn for grade.

When you are ready to post, click the Res Wknd Group Project assignment link, then Either click the “Write Submission” link and directly paste your complete document (containing part 1 and part 2) into this assignment box OR  Browse your Computer add the entire Microsoft Word document as an attachment (Mac users, please remember to append the “.docx” extension to the filename).

MROZ Variable Description

MROZ.DES

inlf hours kidslt6 kidsge6 age educ wage repwage hushrs husage huseduc huswage faminc mtr motheduc fatheduc unem city exper nwifeinc lwage expersq

Variable description for mroz datafile

Obs: 753

1. inlf =1 if in labor force, 1975

2. hours hours worked, 1975

3. kidslt6 # kids < 6 years

4. kidsge6 # kids 6-18

5. age woman’s age in yrs

6. educ years of schooling

7. wage estimated wage from earns., hours

8. repwage reported wage at interview in 1976

9. hushrs hours worked by husband, 1975

10. husage husband’s age

11. huseduc husband’s years of schooling

12. huswage husband’s hourly wage, 1975

13. faminc family income, 1975

14. mtr fed. marginal tax rate facing woman

15. motheduc mother’s years of schooling

16. fatheduc father’s years of schooling

17. unem unem. rate in county of resid.

18. city =1 if live in SMSA

19. exper actual labor mkt exper

20. nwifeinc (faminc – wage*hours)/1000

21. lwage log(wage)

22. expersq exper^2