Problem 1
The of a vector of numbers \(c(a1, a2, a3, ..., an)\) is given by
\[\text{gmean} = (a_1 \cdot a_2 \cdot a_3 \cdot ... \cdot a_n)^{\frac{1}{n}}\]
Problem 1a
Write a user-defined function “gmean” that computes the geometric mean of a vector of numbers.
gmean <- function(vector) {
vec_len <- length(vector)
ret_val <- prod(vector)^(1 / vec_len)
ret_val
}
Problem 1b
Test your function on the following vectors
pr1a <- c(4,9)
gmean(pr1a)
[1] 6
pr1b <- c(2,8)
gmean(pr1b)
[1] 4
pr1c <- c(5,9)
gmean(pr1c)
[1] 6.708204
pr1d <- c(4,1,1/32)
gmean(pr1d)
[1] 0.5
pr1e <- c(8,10,51.2)
gmean(pr1e)
[1] 16
pr1f <- c(1,3,9,27,81)
gmean(pr1f)
[1] 9
Problem 2
Problem 2a
Construct a random 5x6 matrix with \(n=30\) elements:
A <- matrix(sample(30, replace = TRUE), nrow = 5)
Problem 2b
Use the apply() to find the geometric mean of each column
cat("The mean of each column is [", round(apply(A, MARGIN = 2, FUN = gmean), digits = 3), "] \n")
The mean of each column is [ 11.815 15.161 19.409 12.444 8.621 16.687 ]
Problem 3
Problem 3a
Construct a list containing the following three matrices as elements:
a3 <- matrix(1:9, nrow = 3)
b3 <- matrix(4:15, nrow = 4)
c3 <- matrix(8:13, nrow = 3)
pr3_list <- list(a3, b3, c3)
pr3_list
[[1]]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[[2]]
[,1] [,2] [,3]
[1,] 4 8 12
[2,] 5 9 13
[3,] 6 10 14
[4,] 7 11 15
[[3]]
[,1] [,2]
[1,] 8 11
[2,] 9 12
[3,] 10 13
Problem 3b
Use lapply() to extract the 2nd column of each matrix from the list
lapply(pr3_list, FUN = "[", , 2) #Essentially: lapply(data, FUN = "[", rows=all, columns=2)
[[1]]
[1] 4 5 6
[[2]]
[1] 8 9 10 11
[[3]]
[1] 11 12 13
To briefly explain what’s happening here, the function I’m using within lapply is “[”. This function takes in two arguments. It takes in rows, and then columns. So I left an empty space within the commas to include all rows and then specified the second column. This feels a little gross and I can’t help but feel as if I’m missing a far more intuitive solution.
Problem 3c
Use lapply() to extract the 1st row of each matrix from the list
lapply(pr3_list, FUN = "[", 1, ) #Essentially: lapply(data, FUN = "[", rows=1, columns=all)
[[1]]
[1] 1 4 7
[[2]]
[1] 4 8 12
[[3]]
[1] 8 11
Problem 3d
Use sapply() to find the row sums of each matrix in the list.
sapply(pr3_list, rowSums)
[[1]]
[1] 12 15 18
[[2]]
[1] 24 27 30 33
[[3]]
[1] 19 21 23
Problem 3e
Use sapply() to find the geometric mean of each matrix in the list.
cat("The geometric mean of each matrix is [",
round(sapply(pr3_list, gmean), digits=3),
"] \n")
The geometric mean of each matrix is [ 4.147 8.808 10.359 ]
Problem 4
From the Frogger data frame created in the Week 9 lab, compute the mean score and the geometric mean score.
# Create the frogger data frame
frogger <- data.frame(
Player=rep(c("Alice", "Bob", "Connor"), times=c(2,5,3)),
Score=round(rlnorm(n=10,10), -1))
# Compute mean score
mean(frogger$Score)
[1] 33358
# Compute geometric mean score
cat("The geometric mean of the frogger scores is",
gmean(frogger$Score),
"\n")
The geometric mean of the frogger scores is 21771.85
Problem 5
Import the built-in dataset mtcars.
data("mtcars")
Problem 5a
Check the data types and names of the columns. Check for NAs.
colnames(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
sapply(mtcars, typeof)
mpg cyl disp hp drat wt qsec vs
"double" "double" "double" "double" "double" "double" "double" "double"
am gear carb
"double" "double" "double"
sapply(mtcars, function(x) sum(is.na(x)))
mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 0 0 0 0 0 0 0
Problem 5b
Split the mpg data based on the number of cylinders.
# with(mtcars, split(mpg, cyl))
split(mtcars$mpg, mtcars$cyl) # I much prefer the syntax on this one.
$`4`
[1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4
$`6`
[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7
$`8`
[1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0
Problem 5c
Use tapply() to compute the mpg based on the number of cylinders.
tapply(mtcars$mpg, mtcars$cyl, mean)
4 6 8
26.66364 19.74286 15.10000
# with(mtcars, tapply(mpg, cyl, mean)) Here just to show alternate solution
Problem 5d
Use tapply() to compute the mean horse power hp basedd on the number of cylinders.
tapply(mtcars$hp, mtcars$cyl, mean)
4 6 8
82.63636 122.28571 209.21429
Problem 5e
Use apply() combined with tapply() to compute the mean of each column based on the number of cylinders
apply(mtcars, MARGIN = 2, function(x) tapply(x, mtcars$cyl, mean))
mpg cyl disp hp drat wt qsec vs
4 26.66364 4 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909
6 19.74286 6 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286
8 15.10000 8 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000
am gear carb
4 0.7272727 4.090909 1.545455
6 0.4285714 3.857143 3.428571
8 0.1428571 3.285714 3.500000
Problem 6
Write your own Mad Lib that takes in at least 5 input words or numbers, and is at least a few sentences long. (Use a fixed input vector in the .rmd printout that you turn in) You can use a well-known poem, passage, or speech, or make up your own story.
# Inputs: noun, color, noun2, verb, color
x <- list(noun = "velociraptor", color = "lavender", pluralnoun = "puppies", verb = "vibin", verb = "nuzzle", color = "vanilla", verb = "cuddle")
cat("Sometimes that", x[[1]], "looks right at ya. Right into your eyes. And the thing about a", x[[1]], "is he's got lifeless eyes,", x[[2]], "eyes. Like a", x[[3]], "eyes. \n", "When he comes at ya, he doesn't even seem to be", x[[4]], "till he", x[[5]], " ya and those", x[[2]], "eyes roll over white and then ah then you hear that terrible high-pitched screamin'. The ocean turns", x[[6]], "and despite all your poundin' and your hollerin' those", x[[1]], "come in and they", x[[7]], "you to pieces. \n")
Sometimes that velociraptor looks right at ya. Right into your eyes. And the thing about a velociraptor is he's got lifeless eyes, lavender eyes. Like a puppies eyes.
When he comes at ya, he doesn't even seem to be vibin till he nuzzle ya and those lavender eyes roll over white and then ah then you hear that terrible high-pitched screamin'. The ocean turns vanilla and despite all your poundin' and your hollerin' those velociraptor come in and they cuddle you to pieces.
Problem 7
Use the appropriate libridate function to parse each of the following dates. Also, find the weekday for each date.
Problem 7a
d1 <- "February 2, 2021"
d1_x <- mdy(d1)
d1_x
[1] "2021-02-02"
wday(d1_x, label=TRUE)
[1] Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
Problem 7b
d2 <- "4/1/2025" #American date. Goal is April 1st
d2_x <- mdy(d2)
d2_x
[1] "2025-04-01"
wday(d2_x, label=TRUE)
[1] Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
month(d2_x, label=TRUE)
[1] Apr
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Problem 7c
d3 <- "4/1/2025" #European date. Goal is Jan 4th
d3_x <- dmy(d2)
d3_x
[1] "2025-01-04"
wday(d3_x, label=TRUE)
[1] Sat
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
month(d3_x, label=TRUE)
[1] Jan
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Problem 7d
d4 <- "2019-Jun-30"
d4_x <- ymd(d4)
d4_x
[1] "2019-06-30"
wday(d4_x, label = TRUE)
[1] Sun
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
month(d4_x, label = TRUE)
[1] Jun
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Problem 7e
d5 <- "11 Nov 2011"
d5_x <- dmy(d5)
d5_x
[1] "2011-11-11"
wday(d5_x, label = TRUE)
[1] Fri
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
month(d5_x, label = TRUE)
[1] Nov
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
Problem 8
Problem 8a
Compute the time since the first moon landing. Covert this output to years.
age <- today() - mdy("7/20/1969")
as.duration(age)
[1] "1667174400s (~52.83 years)"
Problem b
Create the following data frame.
# We're given an example data frame that I need to recreate before I can get started.
Apollo_Mission <- c(11, 12, 14, 15, 16, 17)
Launch_Date <- c("16 July 1969", "14 Nov 1969", "31 Jan 1971", "26 July 1971", "16 Apr 1972", "07 Dec 1972")
Launch_Time <- c("13:32", "16:22", "21:03", "13:34", "17:54", "05:33")
Moon_L_Date <- c("7/20/1969", "11/19/1969", "2/5/1971", "7/30/1972", "4/21/1972", "12/11/1972")
Moon_L_Time <- c("20:17:40", "06:54:35", "09:18:11", "22:16:29", "02:23:35", "19:54:57")
Splashdown_L_Date <- c("Jul 24, 1969", "Nov 24, 1969", "Feb 9, 1971", "Aug 7, 1971", "Apr 29, 1972", "Dec 19, 1972")
Splashdown_L_Time <- c("16:50:35", "20:58:24", "21:05:00", "21:45:53", "19:45:05", "19:24:59")
apollo <- dplyr::tibble(Apollo_Mission, Launch_Date, Launch_Time, Moon_L_Date, Moon_L_Time, Splashdown_L_Date, Splashdown_L_Time)
apollo
# A tibble: 6 × 7
Apollo_Mission Launch_Date Launch_Time Moon_L_Date Moon_L_Time
<dbl> <chr> <chr> <chr> <chr>
1 11 16 July 1969 13:32 7/20/1969 20:17:40
2 12 14 Nov 1969 16:22 11/19/1969 06:54:35
3 14 31 Jan 1971 21:03 2/5/1971 09:18:11
4 15 26 July 1971 13:34 7/30/1972 22:16:29
5 16 16 Apr 1972 17:54 4/21/1972 02:23:35
6 17 07 Dec 1972 05:33 12/11/1972 19:54:57
# … with 2 more variables: Splashdown_L_Date <chr>, Splashdown_L_Time <chr>
Problem c
Convert all the information to standard lubridate dates and times
# unite columns to get date/hour/minute/second format-------------------
apollo <- tidyr::unite(apollo, "Moon_Landing", Moon_L_Date:Moon_L_Time)
apollo <- tidyr::unite(apollo, "Launch", Launch_Date:Launch_Time)
apollo <- tidyr::unite(apollo, "Splashdown", Splashdown_L_Date:Splashdown_L_Time)
# Convert times to something usable-------------------------------------
apollo$Launch <- dmy_hm(apollo$Launch)
apollo$Moon_Landing <- mdy_hms(apollo$Moon_Landing)
apollo$Splashdown <- mdy_hms(apollo$Splashdown)
apollo
# A tibble: 6 × 4
Apollo_Mission Launch Moon_Landing Splashdown
<dbl> <dttm> <dttm> <dttm>
1 11 1969-07-16 13:32:00 1969-07-20 20:17:40 1969-07-24 16:50:35
2 12 1969-11-14 16:22:00 1969-11-19 06:54:35 1969-11-24 20:58:24
3 14 1971-01-31 21:03:00 1971-02-05 09:18:11 1971-02-09 21:05:00
4 15 1971-07-26 13:34:00 1972-07-30 22:16:29 1971-08-07 21:45:53
5 16 1972-04-16 17:54:00 1972-04-21 02:23:35 1972-04-29 19:45:05
6 17 1972-12-07 05:33:00 1972-12-11 19:54:57 1972-12-19 19:24:59
Problem d
Compute the duration of each mission in days. (Splashdown time - Launch time)
mission_duration <- apollo$Splashdown - apollo$Launch
mission_table <- dplyr::tibble(apollo$Apollo_Mission, mission_duration)
mission_table
# A tibble: 6 × 2
`apollo$Apollo_Mission` mission_duration
<dbl> <drtn>
1 11 8.137905 days
2 12 10.191944 days
3 14 9.001389 days
4 15 12.341586 days
5 16 13.077141 days
6 17 12.577766 days
Problem 8e
Compute the average duration of a mission
cat("The average mission has a time difference of", mean(mission_table$mission_duration), "days \n")
The average mission has a time difference of 10.88796 days