# Instructions

• This is a team assignment.
• You need to write a report with answers to each of the questions.
• Turn in the html output file, and also the Rmd file.
• Total points for the assignment is 20. Five points of the score from the assignment will be given by another team, who will give you full marks if they can compile your report, and get the same answers as you, and find your explanations of the plots understandable and informative. Five points will be for individual effort. And the remaining 10 points will be the final group report.

# Exercise

This assignment explores the data provided on Melbourne house prices by Anthony Pino. The goal is to examine whether housing prices have cooled in Melbourne, and help Anthony decide whether it is time to buy a two bedroom apartment in Northcote.

1. Make a map of Melbourne showing the locations of the properties.

2. Here we are going to examine the prices of 2 bedroom flats in Northcote.
1. Filter the data to focus only on the records for Northcote units. Make a plot of Price by Date, facetted by number of bedrooms. The main thing to learn from this plot is that there are many missing values for number of bedrooms.
2. Impute the missing values, based on the regression method (covered in class). Make sure your predicted value is an integer. Re-make the plot of Price by Date, facetted by number of bedrooms.
3. Write a description of what you learn from the plot, particularly about the trend of 2 bedroom unit prices in Northcote.
3. Focusing on 2 bedroom units, we are going to explore the trend in prices for each suburb.
1. You will need to impute the Bedroom2 variable, in the same way done in the previous question.
2. Fit a linear model to each suburb (many models approach). Collect the model estimates, and also the model fit statistics. Make a plot of intercept vs slope. Using plotly what suburb has had the largest increase, which has had the biggest decrease in prices?
3. Summarise the $$R^2$$ for the model fits for all the suburbs. Which suburbs have the worst fitting models? Plot the Price vs Date of the best fitting model. Is the best fitting model a good fit?
4. Write a paragraph on what you have learned about the trend in property prices across Melbourne.
4. Still focusing on apartments (units) examine the results of the auctions, with the Method variable, across suburbs. This variable contains results of the auction, whether the property sold, or not. It may be that in recent months there is a higher proportion of properties that didn’t sell. This would put downward pressure on prices.
1. Compute the counts of the levels of Method, ignoring the suburbs.
2. The categories PI (passed in) and VB (vendor bid) indicate the property did not sell. Compute the proportion of properties in these two categories for each suburb, for each month since 2016.
3. Plot the proportions against year/month (make a new variable time is an integer with 1 being the first month of the data in 2016 and each month since then increments time by 1). Add a smoother to show the trend in these proportions. Does it look like there is an increase in units that aren’t selling?
4. Explain why the data was aggregated to month before computing the proportions.
5. Fit the best model for Price that you can, for houses around Monash University.
1. Impute the missing values for Bathroom (similarly to Bedroom2).
2. Subset the data to these suburbs “Notting Hill”, “Glen Waverley”, “Clayton”, “Clayton South”,“Oakleigh East”, “Huntingdale”, “Mount Waverley”.
3. Make a scatterplot of Price vs Date by Bedroom2 and Bathroom, with a linear model overlaid. What do you notice? There are only some combinations of bedrooms and bathrooms that are common. Subset your data to houses with 3-4 bedrooms and 1-2 bathrooms.
4. Using date, rooms, bedroom, bathroom, car and landsize build your best model for price. There are some missing values on Car and Landsize, which may be important to impute. Think about interactions as well as main effects. (There are too many missing values to use BuildingArea and YearBuilt. The other variables in the data don’t make sense to use.)