In this week’s lab, the main goal is to get started working with R, set up your project directory for the labs, create your first R notebook, and submit your first report.
Create a project for this unit, in the directory. (Start of each lab by opening this project. Generally it is a good idea NOT TO SAVE THE WORKSPACE when you close a project for the day.)
Open the lab1.Rmd
file provided with the instructions.
Look at the text in the lab1.Rmd
document. YOU DON’T NEED TO TURN IN ANY ANSWERS TO THESE QUESTIONS.
knitr
know that this is code to be run?knit
the document into a Word document.Read in the subset of Australian scores of the 2012 PISA test results, with the code given below. Make a histogram of the variable PV1MATH. These are math test scores for each student.
library(tidyverse)
PISA_oz_sub <- read_csv("PISA_oz_sub.csv")
ggplot(PISA_oz_sub, aes(x=PV1MATH)) +
geom_histogram(binwidth=25)
Describe the distribution of math scores. Discuss min, max, centre and shape.
Make a bar chart of the number of televisions in the house.
library(forcats)
PISA_oz_sub <- PISA_oz_sub %>%
filter(!is.na(ST27Q02)) %>%
mutate(ST27Q02=fct_relevel(ST27Q02,
c("None", "One", "Two", "Three or more")))
ggplot(PISA_oz_sub, aes(x=ST27Q02)) + geom_bar()
mutate
function? Explain why this pre-processing is important.Make a side-by-side boxplot of the math scores by number of TVs, after removing the “None” category.
PISA_oz_sub %>%
filter(ST27Q02 != "None") %>%
ggplot(aes(x=ST27Q02, y=PV1MATH)) + geom_boxplot()
a. Why should the “None” category be removed? b. Explain the relationship between TVs in the household and math scores? Is it helpful for math scores to have more TVs in the household?
Modify the code to show reading scores (PV1READ) against the number of TVs. Explain what you learn about the relationship between these two variables.
Is the relationship between the number of books in the household and math scores similar to that of TVs? Make the appropriate plot to help you answer the question.
This is a count of the number of students responding to the question about classroom management and whether students listen.
PISA_oz_sub %>%
#filter(!is.na(ST85Q01)) %>%
mutate(ST85Q01=fct_relevel(ST85Q01,
c("Strongly disagree", "Disagree",
"Agree", "Strongly agree"))) %>%
group_by(ST85Q01) %>%
tally()
# A tibble: 5 x 2
ST85Q01 n
<fctr> <int>
1 Strongly disagree 12
2 Disagree 63
3 Agree 351
4 Strongly agree 211
5 NA 335
Make a mosaic plot of the classroom management responses, students listen, teacher keeps class orderly.
library(ggmosaic)
PISA_oz_sub_tbl <- PISA_oz_sub %>%
filter(!is.na(ST85Q01)) %>%
filter(!is.na(ST85Q02)) %>%
mutate(ST85Q01=fct_relevel(ST85Q01,
c("Strongly disagree", "Disagree", "Agree", "Strongly agree")),
ST85Q02=fct_relevel(ST85Q02,
c("Strongly disagree", "Disagree", "Agree", "Strongly agree")))
PISA_oz_sub_tbl %>%
group_by(ST85Q01, ST85Q02) %>%
tally()
# A tibble: 14 x 3
# Groups: ST85Q01 [?]
ST85Q01 ST85Q02 n
<fctr> <fctr> <int>
1 Strongly disagree Strongly disagree 8
2 Strongly disagree Disagree 3
3 Strongly disagree Agree 1
4 Disagree Strongly disagree 6
5 Disagree Disagree 44
6 Disagree Agree 12
7 Disagree Strongly agree 1
8 Agree Strongly disagree 4
9 Agree Disagree 74
10 Agree Agree 256
11 Agree Strongly agree 14
12 Strongly agree Disagree 4
13 Strongly agree Agree 68
14 Strongly agree Strongly agree 139
ggplot(PISA_oz_sub_tbl) +
geom_mosaic(aes(x=product(ST85Q01), fill=ST85Q02))
Explain the relationship between the two variables. Do students who strongly agree with one, tend to strongly agree with the other statement?
Your turn to read the data dictionary, and come up with 5 more questions to ask about this data. Make the plots to answer these questions.