In this week’s lab, the main goal is to learn how to tidy a data set. On the due date, turn in your Rmd file and the html product.
Open your project for this class. Make sure all your work is done relative to this project.
Open the lab2.Rmd
file provided with the instructions. You can add your answers to questions from this document.
These are warmups for the later questions. We will work through the examples in the lecture notes, and make sure that they all work for you, and that you know what each function does.
library(tidyverse)
genes <- read_csv("genes.csv")
gtidy <- genes %>%
gather(variable, expr, -id) %>%
separate(variable, c("trt", "leftover"), "-") %>%
separate(leftover, c("time", "rep"), "\\.") %>%
mutate(trt = sub("W", "", trt)) %>%
mutate(rep = sub("R", "", rep))
read.fwf
do? What does the argument c(11, 4, 2, 4, rep(c(5, 1, 1, 1), 31))
do in the function?[]
and ()
? What does [,c(1,2,3,4,seq(5,128,4))]
do in the second line?spread
function do here?melbtemp <- read.fwf("ASN00086282.dly",
c(11, 4, 2, 4, rep(c(5, 1, 1, 1), 31)), fill=T)
melbtemp <- melbtemp[,c(1,2,3,4,seq(5,128,4))]
colnames(melbtemp) <- c("id", "year", "month", "var", paste0("V",1:31))
melbtemp <- melbtemp %>%
gather(day, value, V1:V31) %>%
mutate(day = sub("V", "", day)) %>%
mutate(value=ifelse(value==-9999, NA, value)) %>%
filter(var %in% c("PRCP", "TMAX", "TMIN")) %>%
spread(var, value) %>%
mutate(PRCP=PRCP/10, TMAX=TMAX/10, TMIN=TMIN/10)
41% Of Fliers Think You’re Rude If You Recline Your Seat. In the following table, V1 is a response to the question “Is it rude to recline your seat on a plane?”, and V2 is the response to the question “Do you ever recline your seat when you fly?”.
fly_tbl <- read_csv("fly_tbl.csv")
library(knitr)
kable(fly_tbl)
V1 | V2:Always | V2:Usually | V2:About half the time | V2:Once in a while | V2:Never |
---|---|---|---|---|---|
No, not rude at all | 124 | 145 | 82 | 116 | 35 |
Yes, somewhat rude | 9 | 27 | 35 | 129 | 81 |
Yes, very rude | 3 | 3 | NA | 11 | 54 |
For the data set, rates.csv
,
rates <- read_csv("rates.csv")
head(rates)
# A tibble: 6 x 169
date AED AFN ALL AMD ANG AOA ARS
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-06-20 3.673014 68.1380 119.4394 481.2024 1.783103 165.9165 16.14300
2 2017-06-21 3.673014 68.0805 118.8077 479.9032 1.776957 165.9165 16.21644
3 2017-06-22 3.673014 68.1400 118.6834 480.8324 1.782248 165.9165 16.14900
4 2017-06-23 3.673014 68.1047 118.0250 478.8100 1.775643 165.9165 16.17800
5 2017-06-24 3.673014 68.1047 118.0250 478.8100 1.775643 165.9165 16.17800
6 2017-06-25 3.673014 67.9365 118.0800 478.7900 1.775558 165.9165 16.10150
# ... with 161 more variables: AUD <dbl>, AWG <dbl>, AZN <dbl>, BAM <dbl>,
# BBD <int>, BDT <dbl>, BGN <dbl>, BHD <dbl>, BIF <dbl>, BMD <int>,
# BND <dbl>, BOB <dbl>, BRL <dbl>, BSD <int>, BTC <dbl>, BTN <dbl>,
# BWP <dbl>, BYN <dbl>, BZD <dbl>, CAD <dbl>, CDF <dbl>, CHF <dbl>,
# CLF <dbl>, CLP <dbl>, CNH <dbl>, CNY <dbl>, COP <dbl>, CRC <dbl>,
# CUC <int>, CUP <dbl>, CVE <dbl>, CZK <dbl>, DJF <dbl>, DKK <dbl>,
# DOP <dbl>, DZD <dbl>, EGP <dbl>, ERN <dbl>, ETB <dbl>, EUR <dbl>,
# FJD <dbl>, FKP <dbl>, GBP <dbl>, GEL <dbl>, GGP <dbl>, GHS <dbl>,
# GIP <dbl>, GMD <dbl>, GNF <dbl>, GTQ <dbl>, GYD <dbl>, HKD <dbl>,
# HNL <dbl>, HRK <dbl>, HTG <dbl>, HUF <dbl>, IDR <dbl>, ILS <dbl>,
# IMP <dbl>, INR <dbl>, IQD <dbl>, IRR <dbl>, ISK <dbl>, JEP <dbl>,
# JMD <dbl>, JOD <dbl>, JPY <dbl>, KES <dbl>, KGS <dbl>, KHR <dbl>,
# KMF <dbl>, KPW <dbl>, KRW <dbl>, KWD <dbl>, KYD <dbl>, KZT <dbl>,
# LAK <dbl>, LBP <dbl>, LKR <dbl>, LRD <dbl>, LSL <dbl>, LYD <dbl>,
# MAD <dbl>, MDL <dbl>, MGA <dbl>, MKD <dbl>, MMK <dbl>, MNT <dbl>,
# MOP <dbl>, MRO <dbl>, MUR <dbl>, MVR <dbl>, MWK <dbl>, MXN <dbl>,
# MYR <dbl>, MZN <dbl>, NAD <dbl>, NGN <dbl>, NIO <dbl>, NOK <dbl>, ...
Read in the billboard top 100 music data, which contains N’Sync and Backstreet Boys songs that entered the billboard charts in the year 2000.
billboard <- read_csv("billboard.csv")
1
-76
? What are the variables?This data was pulled from https://www.whaleshark.org in 2013. It lists verified encounters with whale sharks across the globe.
whalesharks <- read_csv("whaleshark-encounters.csv")
Marked Individual
. What individual has the most sightings? How many unmarked individuals are recorded in the database?library(maps)
library(ggthemes)
world_map <- map_data("world")
ggplot(world_map) +
geom_polygon(aes(x=long, y=lat, group=group),
fill="grey90", colour="white") +
theme_map() +
geom_point(data=whalesharks, aes(x=Longitude, y=Latitude),
colour="salmon", alpha=0.5)
ggplot(world_map) +
geom_polygon(aes(x=long, y=lat, group=group),
fill="grey90", colour="white") +
theme_map() +
geom_point(data=filter(whalesharks, !is.na(Sex)),
aes(x=Longitude, y=Latitude,
colour=Sex),
alpha=0.5)
The file budapest.csv
has a subset of web click through data related to hotel searches for Budapest. Each line in this data corresponds to a summary of a person looking for a hotel on the Expedia web site. For these questions, the answer don’t require you to code, but to map out what operations you need to make on the data.
budapest <- read_csv("budapest.csv")
3406
).