Lab 1 Instructions

In this week’s lab, the main goal is to get started working with R, set up your project directory for the labs, create your first R notebook, and submit your first report.

Exercise 1

Create a project for this unit, in the directory. (Start of each lab by opening this project. Generally it is a good idea NOT TO SAVE THE WORKSPACE when you close a project for the day.)

  • File -> New Project -> Existing Directory

Exercise 2

Open the lab1.Rmd file provided with the instructions.

Exercise 3

Look at the text in the lab1.Rmd document. YOU DON’T NEED TO TURN IN ANY ANSWERS TO THESE QUESTIONS.

  • What lines are R code?
  • How does knitr know that this is code to be run?
  • Using the RStudio IDE, work out how to run a chunk of code. Run this chunk, and then run the next chunk.
  • Using the RStudio IDE, how do you run just one line of R code?
  • Using the RStudio IDE, how do you highlight and run multiple lines of code?
  • What happens if you try to run a line that starts with “```{r}”? Or try to run a line of regular text from the document?
  • Using the RStudio IDE, knit the document into a Word document.

Exercise 4

Read in the subset of Australian scores of the 2012 PISA test results, with the code given below. Make a histogram of the variable PV1MATH. These are math test scores for each student.

library(tidyverse)
PISA_oz_sub <- read_csv("PISA_oz_sub.csv")
ggplot(PISA_oz_sub, aes(x=PV1MATH)) +
  geom_histogram(binwidth=25) 

Describe the distribution of math scores. Discuss min, max, centre and shape.

Exercise 5

Make a bar chart of the number of televisions in the house.

library(forcats)
PISA_oz_sub <- PISA_oz_sub %>%
  filter(!is.na(ST27Q02)) %>%
  mutate(ST27Q02=fct_relevel(ST27Q02, 
      c("None", "One", "Two", "Three or more")))
ggplot(PISA_oz_sub, aes(x=ST27Q02)) + geom_bar()

  1. Most households in the study have how many TVs?
  2. What happens to the plot if you neglect to do the first data transformation line especially the mutate function? Explain why this pre-processing is important.

Exercise 6

Make a side-by-side boxplot of the math scores by number of TVs, after removing the “None” category.

PISA_oz_sub %>% 
  filter(ST27Q02 != "None") %>%
  ggplot(aes(x=ST27Q02, y=PV1MATH)) + geom_boxplot()

a. Why should the “None” category be removed? b. Explain the relationship between TVs in the household and math scores? Is it helpful for math scores to have more TVs in the household?

Exercise 7

Modify the code to show reading scores (PV1READ) against the number of TVs. Explain what you learn about the relationship between these two variables.

Exercise 8

Is the relationship between the number of books in the household and math scores similar to that of TVs? Make the appropriate plot to help you answer the question.

Exercise 9

This is a count of the number of students responding to the question about classroom management and whether students listen.

PISA_oz_sub %>% 
  #filter(!is.na(ST85Q01)) %>%
  mutate(ST85Q01=fct_relevel(ST85Q01, 
      c("Strongly disagree", "Disagree", 
        "Agree", "Strongly agree"))) %>%
  group_by(ST85Q01) %>% 
  tally()
# A tibble: 5 x 2
            ST85Q01     n
             <fctr> <int>
1 Strongly disagree    12
2          Disagree    63
3             Agree   351
4    Strongly agree   211
5                NA   335
  1. Would you report that students say that they mostly listen during class?
  2. What line in the code does the counting?

Exercise 10

Make a mosaic plot of the classroom management responses, students listen, teacher keeps class orderly.

library(ggmosaic)
PISA_oz_sub_tbl <- PISA_oz_sub %>% 
  filter(!is.na(ST85Q01)) %>%
  filter(!is.na(ST85Q02)) %>%
  mutate(ST85Q01=fct_relevel(ST85Q01, 
      c("Strongly disagree", "Disagree", "Agree", "Strongly agree")), 
      ST85Q02=fct_relevel(ST85Q02, 
      c("Strongly disagree", "Disagree", "Agree", "Strongly agree"))) 

PISA_oz_sub_tbl %>%
  group_by(ST85Q01, ST85Q02) %>%
  tally()
# A tibble: 14 x 3
# Groups:   ST85Q01 [?]
             ST85Q01           ST85Q02     n
              <fctr>            <fctr> <int>
 1 Strongly disagree Strongly disagree     8
 2 Strongly disagree          Disagree     3
 3 Strongly disagree             Agree     1
 4          Disagree Strongly disagree     6
 5          Disagree          Disagree    44
 6          Disagree             Agree    12
 7          Disagree    Strongly agree     1
 8             Agree Strongly disagree     4
 9             Agree          Disagree    74
10             Agree             Agree   256
11             Agree    Strongly agree    14
12    Strongly agree          Disagree     4
13    Strongly agree             Agree    68
14    Strongly agree    Strongly agree   139

ggplot(PISA_oz_sub_tbl) +
  geom_mosaic(aes(x=product(ST85Q01), fill=ST85Q02))

Explain the relationship between the two variables. Do students who strongly agree with one, tend to strongly agree with the other statement?

Exercise 11

Your turn to read the data dictionary, and come up with 5 more questions to ask about this data. Make the plots to answer these questions.