Instructions

Exercise

I love a sunburnt country

A land of sweeping plains

Of ragged mountain ranges

Of droughts and flooding rains

(From the poem My Country by Dorothea MacKellar)

In this last week there have been many news stories about an ongoing dry period in large parts of Australia. In the west of NSW, large parts of Queensland, and even the Gippsland region of Victoria it is reported that there has been insufficient rain for many months and this is severely affecting many farmers. This assignment is designed to examine what is happening using publicly available data.

Data collection

This is what I have done to get the data to this point. The Global Historical Climate Network maintained by NOAA curates weather records for stations across the globe.

  • You can get the list of stations and their latitude and longitude from the ghcnd-stations.txt file from the raw files site https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/. Australian stations have “ASN” prefixing the station id. There are 17219 recording stations across Australia. Some of them are on Antartica, remote islands, and possibly on Naval ships because the locations of these few are far from the mainland.
library(tidyverse)
stns <- read_fwf("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt", 
   fwf_widths(c(11, 9, 10, 7, 3, 30, 4, 4, 6),
   c("stn", "lat", "lon", "elev", "state", "name", "flg1", "flg2", "wmo")))
stns_oz <- stns[grep("ASN", stns$stn),]
keep <- stns_oz %>% 
  filter(lat > -43, lat < -12, lon > 113.6, lon < 153.6)
library(ggmap)
oz <- get_map(location=c(lon=133.8807, lat=-23.6980), zoom=4)
ggmap(oz) + 
  geom_point(data=stns_oz, aes(x=lon, y=lat), 
              alpha=0.5, colour="orange", size=0.5)
  • The data is stored in a single file for each station. The Australian Bureau of Meteorolgy links to this site for Australian data, and you can download station by station from their site, but this is a very inefficient interface. We can get multiple files read by scripting it in R. (You could pay BOM to download your preferred summaries.) I pulled weather data from stations around Victoria as a first pass, and checked the precipitation of the most recent records. It was clear quickly that a lot of stations have not had their data updated recently on this database. So I needed a way to get just the data from stations that had measurements for this year, all the way through to August. The web page with the list of files available for all stations has a “modified date” for each file, so I extracted this information and used it to select only stations in Australia that had data modified this month.
library(rvest)
library(lubridate)
x <- read_html("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/all/?C=M;O=A")
tb <- x %>% html_table()

asn <- read_csv("data/asn.csv")
asn <- asn %>%
  select(-junk) %>%
  mutate(datetime = dmy_hm(datetime)) %>%
  mutate(date = date(datetime))
asn_aug <- asn %>% filter(date > ymd("20180731"))
asn_aug <- asn_aug %>% mutate(station=substr(station, 1, 11))
stns_aug <- stns_oz %>% filter(stn %in% asn_aug$station)
ggmap(oz) + 
  geom_point(data=stns_aug, aes(x=lon, y=lat), 
              alpha=0.5, colour="orange")
  • The next step is to go file by file, and pull the data for these stations, combine it into a single data file. This is the same as what we did for the Melbourne weather station during a past class.
oz_weather <- NULL
for (i in 1:nrow(asn_aug)) {
  cat(i, "\n")
  filenm <- paste0("https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/all/",
                   asn_aug$station[i],".dly")
   x <- read_fwf(filenm, 
   col_positions=fwf_widths(c(11, 4, 2, 4, 
        rep(c(5, 1, 1, 1), 31)), 
        col_names = c("station", "year", "month",
              "variable", paste0("X", 5:128))))
   oz_weather <- bind_rows(oz_weather, x)
}
oz_weather <- oz_weather %>% filter(year > 1950)
oz_weather <- oz_weather %>% filter(variable == "PRCP")
oz_weather <- oz_weather[,c(1:4, seq(5,128,4))] 
write_csv(oz_weather, path="data/oz_weather.csv")

Your don’t need to repeat what I have done. Start by reading in the data that is already created.

library(tidyverse)
oz_weather <- read_csv("data/oz_weather.csv")

Your tasks

There is some help with the code required for this assignment. Where you see ??? in the code, is where you need to fill in to make it work.

  1. Use your web surfing skills and find an article on the CURRENT drought in Australia. Report the link to the article and write a few sentences summarising its main points.
  2. Read in the raw data and put it into tidy form. The code is provided, with a few spots where you need to fill in the functions.
oz_weather_long <- oz_weather %>% 
  gather(day, prcp, X5:X125) %>% 
  mutate(day = sub("X", "", day)) %>%
  mutate(day = as.numeric(day)%/%4) %>%
  mutate(day = as.integer(day)) %>% 
  mutate(prcp=ifelse(prcp==???, NA, prcp)) %>%
  select(-variable) %>%
  mutate(prcp=prcp/???) %>%
  mutate(month=as.numeric(???))
a. What value is used in the raw data to indicate a missing value?
b. Why do the precipitation values divided by something?
  1. Compute the monthly precipitation for each year. (You need to sum up the precipitation for each month, for each year.) When working with precipitation it is important to summarise using totals. This is different from working with temperature, where we would typically summarise using means. Why should you use totals to summarise precipitation?
oz_prcp_mthyr <- oz_weather_long %>%
  filter(!(year == ??? & month == ???)) %>%
  group_by(???, ???, ???) %>%
  summarise(prcp_tot = sum(???, na.rm=T)) %>%
  ungroup()
  1. Make a line plot of monthly precipitation by month, grouping by station, for 2018. Overlay a smoother. Is there generally a decreasing trend in rainfall this year across the country?
oz_prcp_mthyr %>% 
  filter(year == ???) %>%
  ggplot(aes(x=???, y=???)) + 
  geom_line(aes(group=???), alpha=0.2) +
  geom_???(se=FALSE, alpha=1)
  1. Compute the long term average monthly precipitation for each station. (Use your previous summary and then average the values by month.) This is going to be a baseline for comparing precipitation this year.
oz_prcp_longterm <- oz_prcp_mthyr %>%
  filter(??? < 2018) %>%
  group_by(???, ???) %>%
  summarise(prcp_av = mean(???))
ggplot(oz_prcp_longterm, aes(x=???, y=???)) + 
  geom_line(aes(group=???), alpha=0.2) 
  1. Now we want to look at the general pattern of rainfall this year, whether it is higher of lower than expected. Compute the relative change between monthly precipitation of the first 7 months of this year, 2018, and the long term monthly average precipitation, for each station. Plot this difference on a map of Australia. (Make sure you use an appropriate colour scale.) To calculate relative change, use this formula:

\[Relative~change~ = \frac{T_{2018}-T_{longterm}}{T_{longterm}}\]

(While you have this map, make it interactive with plotly, so that the station id pops up on mouseover. Find a station in the west of NSW or southwest of Qld, or even in Gippsland. Keep the id for this station for the next question.)

stns_oz <- read_csv("data/stns_oz.csv")
oz_prcp_rel_change <- oz_prcp_mthyr %>%
  filter(??? == 2018) %>%
  left_join(oz_prcp_longterm, by=c(???, ???)) %>%
  mutate(relch = (??? - ???)/???)
oz_prcp_rel_change <- oz_prcp_rel_change %>%
  left_join(stns_oz, by=c(???=???))
library(ggmap)
oz <- get_map(location=c(lon=???, lat=???), zoom=???)
ggmap(oz) + 
  geom_???(data=oz_prcp_rel_change, aes(x=???, y=???, colour=???)) +
  facet_wrap(~???, ncol=4) +
  scale_colour_gradient2(???, limits=c(-1,1))
  1. This is another way to look at how this year’s rainfall compares with previous years. For each month, compute the percentile of this year’s rainfall. That is, count the number of years since 1950 that this month had lower precipitation than this year. Look at the values for the station you identified in the previous question. Write a sentence or two describing the rainfall pattern for this year at your chosen station, relative to the same month in other years. Make a line plot of precipitation by month for all years, and overlay this year’s in a highlight colour, like orange. Describe the relative amount of rain for this year based on this plot.
oz_prcp_2018 <- oz_prcp_mthyr %>%
  filter(??? == 2018) %>%
  rename(prcp_tot18 = ???)
oz_prcp_2018_pctl <- oz_prcp_mthyr %>%
  filter(year < ???, month < 8) %>%
  left_join(oz_prcp_2018, by=c(???, ???)) %>%
  group_by(???, ???) %>%
  summarise(nless=length(???[???t < prcp_tot18]), n=length(???)) %>%
  mutate(pct = ???/n) %>%
  ungroup()
  1. Find another historical long period of well below normal rainfall for the first half of the year, for this station (or another). Do a web search to see if there were any news stories about that drought period.
mystn <- oz_prcp_mthyr %>% filter(station==???) %>%
  arrange(year, month)
mystn_dryyrs <- mystn %>%
  group_by(???) %>%
  summarise(s = sum(???[1:7]))
mystn_dryyrs %>% arrange(s)

Grading

Points for the assignment will be based on: