STAT 19000: Project 13 — Fall 2021
Motivation: It is always important to stay fresh and continue to hone and improve your skills. For example, games and events like https://adventofcode.com/ are a great way to keep thinking and learning. Plus, you can solve the puzzles with any language you want! It can be a fun way to learn a new programming language.
Proper Preparation Prevents Poor Performance.
In this project we will continue to wade through data, with a special focus on the apply suite of functions, building your own functions, and graphics.
Context: This is the last project of the semester! Many of you will have already finished your 10 projects, but for those who have not, this should be a fun and straightforward way to keep practicing.
Scope: r
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales_cleaner.txt
Questions
Question 1
Run the lines of code below from project (12) to read the data and format the year
and month
.
library(data.table)
library(lubridate)
liquor <- fread('/anvil/projects/tdm/data/iowa_liquor_sales/iowa_liquor_sales_cleaner.txt')
liquor$date <- mdy(liquor$Date)
liquor$year <- year(liquor$date)
liquor$month <- month(liquor$date)
Run the code below to get a better understanding of columns State Bottle Cost
and the State Bottle Retail
.
head(liquor[,c("State Bottle Cost", "State Bottle Retail")])
typeof(liquor$`State Bottle Cost`)
typeof(liquor$`State Bottle Retail`)
Create two new columns, cost
and retail
to be numeric
versions of State Bottle Cost
and the State Bottle Retail
respectively.
Once you have those two new columns, create a column called profit
that is the profit for each sale. Which sale had the highest profit?
There are many ways to solve the question. Relevant topics contains functions to use in some possible solutions. |
Relevant topics: gsub, substr, nchar, as.numeric, which.max
-
Code used to solve this problem.
-
Output from running the code.
-
The date, vendor name, number of bottles sold and profit for the sale with the highest profit.
Question 2
We want to provide useful information based on a Vendor Number
to help in the decision making process.
Create a function called createDashboard
that takes two arguments: a specific Vendor Number
and the liquor
data frame, and returns a plot with the average profit per year, corresponding to the profit for that Vendor Number
.
Relevant topics: tapply, plot, mean
-
Code used to solve this problem.
-
Output from running the code.
-
The results of running
createDashboard(255, liquor)
.
Question 3
Modify your createDashboard
function that uses the liquor
data frame as the default value, if the user forgets to give the name of a data frame as input to the function.
We are going to start adding additional plots to your function. Run the code below first, before you run the code to build your plots. This will organize many plots in a single plot.
par(mfrow=c(1, 2))
Note that we are creating a dashboard in this question with 1 row and 2 columns.
Add a bar plot to your dashboard that shows the total volume sold using Bottle Volume (ml)
.
Make sure to add titles to your plots.
Relevant topics: table, barplot
-
Code used to solve this problem.
-
Output from running the code.
-
The results of running
createDashboard(255)
.
Question 4
Modify par(mfrow=c(1, 2))
argument to be par(mfrow=c(2, 2))
so we can fit 2 more plots in our dashboard.
Create a plot that shows the average number of bottles sold per month.
Optional: Modify the argument mar
in par()
to reduce the margins between the plots in our dashboard.
Relevant topics: tapply, plot, mean
-
Code used to solve this problem.
-
Output from running the code.
-
The results of running
createDashboard(255)
.
Question 5
Add a plot to complete our dashboard. Write 1-2 sentences explaining why you chose the plot in question.
Optional: Add, remove, and/or modify the dashboard to contain information you find relevant. Make sure to document why you are making the changes.
Relevant topics: tapply, plot, mean
-
Code used to solve this problem.
-
Output from running the code.
-
The results of running
createDashboard(255)
.
Question 6 (optional, 0 pts)
patchwork
is a very cool R package that makes for a simple and intuitive way to combine many ggplot plots into a single graphic. See here for details.
Re-write your function createDashboard
to use patchwork
and ggplot
.
-
Code used to solve this problem.
-
Output from running the code.
Question 7 (optional, 0 pts)
Use your createDashboard
function to compare 2 vendors. You can print the dashboard into a pdf using the code below.
pdf(file = "myFilename.pdf", # The directory and name you want to save the file in
width = 8, # The width of the plot in inches
height = 8) # The height of the plot in inches
createDashboard(255)
dev.off()
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. |