R
Do the problems below. You should turn in a .pdf file that you created from a .Rmd file.
You need to show me all of the R
code you use to solve the problems, and you need to write your final answers in complete sentences. There are only 3 problems, so you can spend a lot of time using your R
& RMarkdown skills and resources to make the answers look very nice. It would also be nice if each problem read as a mini data analysis project instead of numbered answers to questions, but this is not required.
This assignment is due on Wednesday, June 27th at 9:00am. You can get the .Rmd that creates this assignment page here to help you get started.
Note: I don’t expect you to know how to do everything right away. Consult the slides, the R
cheatsheets, the R
help files, each other, Google, etc. This assignment was designed to use a little bit of most slide sets we’ve done so far.
The dataset starbucks
in the openintro
package contains nutritional information on 77 Starbucks food items. Spend some time reading the help file of this dataset. For this problem, you will explore the relationship between the calories and carbohydrate grams in these items.
lm()
function.ggplot2
function fortify
can help a lot with this. Describe what you see in the residual plot. Does the model look like a good fit?The openintro
package contains a dataset called absenteeism
that consists of data on 146 schoolchildren in a rural area of Australia. Spend some time reading the help file of this dataset. We are interested in seeing if the ethnicity (aboriginal or not), sex (male or female), and learning ability (average or slow) of the children affects the number of days they are absent from school.
Eth
, Sex
, and Lrn
variables to binary variables. One way to do this is with the function ifelse()
. You should construct them so that
Eth = 1
if the student is not aboriginal and Eth = 0
if the student is aboriginal;Sex = 1
if the student is male and Sex = 0
if the student is female;Lrn = 1
if the student is a slow learner and Lrn = 0
is the student is an average learner.Days
as the dependent variable and the three variables mentioned in (1) as explanatory variables.library(tidyverse)
newdata <- data_frame(Eth = c(1,1,1,0,0),
Sex = c(0,1,0,1,0),
Lrn = c(0,0,1,1,0))
The openintro
package contains a dataset called orings
that contains information on 23 NASA space shuttle launches. Spend some time reading the help file of this dataset.
orings2
library(openintro)
data("orings")
orings2 <- NULL
for(i in 1:nrow(orings)){
new <- data.frame(temp = orings$temp[i], # for each row in orings,
fail = rep(c(1,0), c(orings$damage[i], # create 6 new rows:
6-orings$damage[i]))) # 1 for each launch
orings2 <- rbind(orings2, new)
}
orings2
, fit a logistic regression to the data using the glm
function.53:81
). Create a plot showing the observed data (in orings2
) as points (\(x\) = temperature, \(y\) = fail) and draw a line through the predicted probabilities at each temperature value from 53-81. Describe what you see in the plot.