Exercise

The Atlas of Living Australia data for the area around Monash’s Clayton campus, Melbourne has been downloaded for you. We are going to take a look at what wildlife has been sighted around campus.

monash <- read_csv("data/monash_species.csv")
  1. (1pt) What was the earliest and latest dates of wildlife being sighted in the data provided?

  2. (1pt) How many different species have been sighted? 316

  3. (1pt) What species is the most commonly sighted around Monash?

  4. It’s a bit surprising to see sightings dated in the 1800s. Let’s look at these in more detail.

    1. (1pt) Count the number of sightings by day, and make a plot of count by date. What year do you think Monash University was established (based on this data)? Why? Check your guess using google.
    2. (1pt) Subset the measurements recorded prior to 1950. Who was the collector?
    3. (1pt) Of these historically observed species, are any not seen around campus any more?
  5. (1pt) We are going to create a subset, especially for you to analyse, of a random sample of 4 of the species commonly seen in recent years. Using the code provided do the following:

Subset the data to species seen after 1950. Count the number of sightings of each species. Randomly sample 4 from the ones that have been sighted at least 100 times. List the Scientific names of your four species.

myspecies <- monash %>%
  filter(year(`Event Date - parsed`) >= 1950) %>%
  count(`Scientific Name`, sort=TRUE) %>% 
  filter(n > 100) %>%
  sample_n(4) 
mysample <- monash %>%
  filter(year(`Event Date - parsed`) >= 1950) %>%
  filter(`Scientific Name` %in% myspecies$`Scientific Name`)
  1. (2 pts) Make a map of campus, and plot the locations of species sightings, coloured and faceted by the different species. Write a few sentences describing the distribution of the species - use Vernacular name for any that have one.

  2. (2 pts) Aggregate the sightings for each species, by month. Make a plot of number of sightings by month. Write a sentence or two discussing the relative frequency of sightings by month of the year.

  3. (2 pts) Aggregate by hour of the day. Make a line plot of frequency of sighting by hour. What are the most common times of day to see these species?

  4. (3 pts) Find the species description on wikipedia. Read in the text descriptions for each of your species, using web scraping (example code is below). Conduct a text analysis to determine which words most distinguish the different between the four species.

  5. (2 pts) Now expand your subset again, to include the 25 most common species.
    1. Compute the frequency of sighting by hour of the day.
    2. Standardize the hourly counts for each species by dividing by the maximum counts. (This will put the counts for each species in the range 0 to 1, that is, it converts them to proportion of sightings occurring each hour.)
    3. Spread the data to have hour in the columns, and species in the rows, and the proportion in the cells.
    4. Compute the Euclidean distance between species, that is the distance between proportions in each hour.
    5. Convert distances to a binary matrix, and use this to produce a network map of the species. This indicates which species are more commonly seen at similar times of the day.

Grading

Two points reserved for easy to compile, spell-checked, nicely turned in work.