Background

Federal elections are held in Australia approximately every three years, with the exact timing at the pleasure of the party in government. It is rumoured that the next election will be in May 2019. The ads have already started on TV, “Australians aren’t going to cop it” is the catch phrase of one minor party, robocalls have begun, ads are appearing in facebook, and voting preference polling is in full swing. This assignment is asking you to work with different publicly available data on the election, with a goal to practice tidying and wrangling skills, with an added side effect of learning about the Australian political landscape.

Instructions

• This is a team assignment. You need to list the members of the team at the top of the report as the authors. You will be asked to report on the efforts of your team members on this assignment. If all members substantially contribute to the submission, then you can elect not to report. If you feel that one of the team members didn’t provide a sufficient contribution, please provide this information to us using the form that will be provided, and we will determine an appropriate resolution. It may be a reduction in marks for the assignment.
• Team work has pros and cons, but the end result should be a positive impact on your learning. The best way to approach group work, for you as an individual, is to do as much of the assignment as you can, yourself, as soon as the assignment instructions are released. Then work with your team members to see if you can solve each others difficulties. Bring any problems that you can’t solve together to a consultation, well before the assignment is due. Final polishing would include comparing your explanations to produce the best written and clearest paragraphs. Team work is common in the work force, and being able to navigate working with others is an important skill to develop.
• You need to write a report with answers to each of the questions. Spell check using the RStudio spell-checker before submission.
• R code should be hidden in the final report, unless it is specifically requested.
• To make it a little easier for you, a skeleton of R code is provided in the Rmd file. Where you see ??? means that something is missing and you will need to fill it in with the appropriate function, argument or operator. You will also need to rearrange the code as necessary to do the calculations needed.
• Original work is expected. Any material used from external sources needs to be acknowledged.
• Turn in the html output file, and also the Rmd file. That is, two files need to be submitted. They will be submitted using the ED system, and we will show you how to do this in Thursday’s class week 3.

Marks

• Total points for the assignment is 20.
• 5 points of the score from the assignment will be given by another team, who will give you full marks if they can compile your report, and get the same answers as you, and find your explanations of the plots understandable and informative. Peer assessment will happen in the class period when the report is submitted.
• 5 points will be reserved for readability, clearly written R code, and appropriate citing of external sources.
• Accuracy and completeness of answers, and clarity of explanations will be the basis for the remaining 10 points.

Exercise

1. Download the AESID data. This is a multisheet excel file. The first sheet gives an overview of how the data was collected. The second sheet provides details on the data in different sheets. Take a look at sheet 1.5
1. Make a sketch of what’s in this sheet. Particularly map out the variables, and different types of aggregations. How many variables are there in this sheet? List them. How often was the survey conducted? Did the researchers adjust percentages to account for differences between sample and population demographics? Write a few sentences about the data is in this sheet, in particular how this data might help us to answer the question “What is the trend in the ways people of different ages discuss politics?”

2. Turn the data into long form, where year is in a column.

3. Remove the rows corresponding to overall aggregates.

4. Rename the first column to “communication”.

5. Separate the second column into two, one called “variable”, and the second called “level”. (Hint: You will need to use sep=":" as an argument.)

6. Convert year into a numeric variable.

7. Subset to have only rows corresponding to the variable “age”.

8. Make a line plot showing the percentage by year, with separate coloured by age group, and faceted by communcation. Write a paragraph on what you learn from this plot.

9. It seems that there was a big drop in people reporting talking about politics, or trying to persuade others how to vote, after 1998. Using your internet searching skills, do some research to find out plausible reasons for the big drop in these percentages, and describe what you have learned.

1. Download the FPP data, and read the data into R.
1. Take a look at the structure of the data file. Is this data in tidy format? Explain your answer.

2. Count the number of candidates in each electorate. What is the most number of candidates in any electorate? What is the least? What is the average number of candidates per electorate?

3. Filter the data on the candidates who won the electorate. Summarise the number of electorates won by each party. Write a sentence describing what you learn.

4. Compute a new variable which is the percentage of the vote earned by each candidate. Filter the data on the candidates who won the electorate. Make a plot of the distribution of percentage of the first preference vote that each winner won with. (Hint: a histogram, density plot or boxplot would be appropriate plots here.) Write a sentence describing the distribution.

5. Make a list of the electorates where the winner did not get the majority first preferences, that includes the Division name, candidate name, their first preference percentage and the highest first preference percentage in the electorate. Write a sentence explaining how it happens that a candidate can win the electorate without getting the largest number of first preference votes.

6. How many different parties contested the election? Which party received the least votes over the country? How many electorates were contested by the party with the least votes?

7. Write an explanation of preferential voting as it is used in Australia, and how first preferences factor in to the final result.