Federal elections are held in Australia approximately every three years, with the exact timing at the pleasure of the party in government. It is rumoured that the next election will be in May 2019. The ads have already started on TV, “Australians aren’t going to cop it” is the catch phrase of one minor party, robocalls have begun, ads are appearing in facebook, and voting preference polling is in full swing. This assignment is asking you to work with different publicly available data on the election, with a goal to practice tidying and wrangling skills, with an added side effect of learning about the Australian political landscape.



About the data


  1. Download the AESID data. This is a multisheet excel file. The first sheet gives an overview of how the data was collected. The second sheet provides details on the data in different sheets. Take a look at sheet 1.5
  1. Make a sketch of what’s in this sheet. Particularly map out the variables, and different types of aggregations. How many variables are there in this sheet? List them. How often was the survey conducted? Did the researchers adjust percentages to account for differences between sample and population demographics? Write a few sentences about the data is in this sheet, in particular how this data might help us to answer the question “What is the trend in the ways people of different ages discuss politics?”

  2. Turn the data into long form, where year is in a column.

  3. Remove the rows corresponding to overall aggregates.

  4. Rename the first column to “communication”.

  5. Separate the second column into two, one called “variable”, and the second called “level”. (Hint: You will need to use sep=":" as an argument.)

  6. Convert year into a numeric variable.

  7. Subset to have only rows corresponding to the variable “age”.

  8. Make a line plot showing the percentage by year, with separate coloured by age group, and faceted by communcation. Write a paragraph on what you learn from this plot.

  9. It seems that there was a big drop in people reporting talking about politics, or trying to persuade others how to vote, after 1998. Using your internet searching skills, do some research to find out plausible reasons for the big drop in these percentages, and describe what you have learned.

  1. Download the FPP data, and read the data into R.
  1. Take a look at the structure of the data file. Is this data in tidy format? Explain your answer.

  2. Count the number of candidates in each electorate. What is the most number of candidates in any electorate? What is the least? What is the average number of candidates per electorate?

  3. Filter the data on the candidates who won the electorate. Summarise the number of electorates won by each party. Write a sentence describing what you learn.

  4. Compute a new variable which is the percentage of the vote earned by each candidate. Filter the data on the candidates who won the electorate. Make a plot of the distribution of percentage of the first preference vote that each winner won with. (Hint: a histogram, density plot or boxplot would be appropriate plots here.) Write a sentence describing the distribution.

  5. Make a list of the electorates where the winner did not get the majority first preferences, that includes the Division name, candidate name, their first preference percentage and the highest first preference percentage in the electorate. Write a sentence explaining how it happens that a candidate can win the electorate without getting the largest number of first preference votes.

  6. How many different parties contested the election? Which party received the least votes over the country? How many electorates were contested by the party with the least votes?

  7. Write an explanation of preferential voting as it is used in Australia, and how first preferences factor in to the final result.