About the data


  1. Download the AESID data. This is a multisheet excel file. The first sheet gives an overview of how the data was collected. The second sheet provides details on the data in different sheets. Take a look at sheet 1.5
  1. Make a sketch of what’s in this sheet. Particularly map out the variables, and different types of aggregations. How many variables are there in this sheet? List them. How often was the survey conducted? Did the researchers adjust percentages to account for differences between sample and population demographics? Write a few sentences about the data is in this sheet, in particular how this data might help us to answer the question “What is the trend in the ways people of different ages discuss politics?”

6 variables

year, communication (Discuss politics, Persuade others how to vote), age (18-24, 25-34, 35-44, 45-54, 55-64, 65plus), education (No qualification, Non-tertiary qualification, Tertiary qualification ), gender (Female, Male), vote (Greens, Labor Party, Liberal Party, National Party, Other).

Yes, percentages were adjusted to reflect the population demographics: “The 1993, 2010, 2013 and 2016 surveys have been weighted to reflect the characteristics of the national electorate.”

This spreadsheet contains percentages of people in the survey reporting that they discuss politics and persuade others how to vote, for the years 1993-2016, broken down by several demographics, age, gender, education and voting preference.

  1. Turn the data into long form, where year is in a column.
aesid_l <- aesid %>% gather(year, pct, `1993`:`2016`) 
  1. Remove the rows corresponding to overall aggregates.
aesid_l <- aesid_l %>% filter(!is.na(v2))
  1. Rename the first column to “communication”.
aesid_l <- aesid_l %>% 
  rename(communication = v1)
  1. Separate the second column into two, one called “variable”, and the second called “level”. (Hint: You will need to use sep=":" as an argument.)
aesid_l <- aesid_l %>% 
  separate(v2, c("variable", "level"), sep=":")
  1. Convert year into a numeric variable.
aesid_l <- aesid_l %>% 
  mutate(year = as.numeric(year)) 
  1. Subset to have only rows corresponding to the variable “age”.
aesid_l <- aesid_l %>% 
  filter(variable == "Age")
  1. Make a line plot showing the percentage by year, with separate coloured by age group, and faceted by communcation. Write a paragraph on what you learn from this plot.

More people discuss politics than try to persuade others how to vote. Between 1998-2001 there was a big drop in the percentage of people doing either, especially true for the youngest age group 18-24. The youngest group 18-24 have the highest percentages who report trying to persuade others how to vote.

  1. It seems that there was a big drop in people reporting talking about politics, or trying to persuade others how to vote, after 1998. Using your internet searching skills, do some research to find out plausible reasons for the big drop in these percentages, and describe what you have learned.

The election was held 6 months earlier than required by law. There was a big swing against the incumbent party, and it did not win the popular vote. This was the biggest discrepancy between the popular vote and the electoral tally in the history of Australian politics. The election was called immediately after the government announced the launch of the Goods and Services Tax (GST), clearly an unpopular policy.

Source: https://en.wikipedia.org/wiki/1998_Australian_federal_election#Background

  1. Download the FPP data, and read the data into R.
  1. Take a look at the structure of the data file. Is this data in tidy format? Explain your answer.

Yes, the data is in tidy format. The observations are made on candidates in the federal election, and there are many variables measured for each candidate populating the columns.

  1. Count the number of candidates in each electorate. What is the most number of candidates in any electorate? What is the least? What is the average number of candidates per electorate?

The most number of candidates in any electorate is12.

The least number of candidates in any electorate is 4

The average number of candidates in any electorate is 7.6

  1. Filter the data on the candidates who won the electorate. Summarise the number of electorates won by each party. Write a sentence describing what you learn.
PartyNm n
Liberal 45
Australian Labor Party 43
Labor 24
Liberal National Party of Queensland 21
The Nationals 10
Australian Labor Party (Northern Territory) Branch 2
Independent 2
Katter’s Australian Party 1
Nick Xenophon Team 1
The Greens 1

The Liberal and Labor party have a fairly even split of the electorates, with several affiliated parties, with variations of the major party names that are represented in a small number of electorates, and a handful of electorates were won by unaffiliated minor parties and independents.

  1. Compute a new variable which is the percentage of the vote earned by each candidate. Filter the data on the candidates who won the electorate. Make a plot of the distribution of percentage of the first preference vote that each winner won with. (Hint: a histogram, density plot or boxplot would be appropriate plots here.) Write a sentence describing the distribution.