Ch. 3 of OpenIntro Statistics problems 4, 6, 8, 16, 26, 28, 32, 44.
In triathlons, it is common for racers to be placed into age and gender groups. Friends Leo and Mary both completed the Hermosa Beach Triathlon, where Leo competed in the Men, Ages 30 - 34 group while Mary competed in the Women, Ages 25 - 29 group. Leo completed the race in 1:22:28 (4948 seconds), while Mary completed the race in 1:31:53 (5513 seconds). Obviously Leo finished faster, but they are curious about how they did within their respective groups. Can you help them? Here is some information on the performance of their groups:
Remember: a better performance corresponds to a faster finish.
a: Men’s finishing times: \(M \sim N(4313, 583)\). Women’s finishing times: \(W \sim N(5261, 807)\). b. Leo’s \(z\)-score: \(z_L = \frac{4948 - 4313}{583} =\) 1.089. Mary’s \(z\)-score: \(z_M = \frac{5513 - 5261}{807} =\) 0.312. The \(z\)-scores tell you the number of standard deviations away from the mean the observation is. It gives you a way to compare observations from different groups.
c. With respect to their groups, Mary had a better time than Leo because her \(z\)-score was smaller. d. In symbols, we want to know: \(P(M \geq 4948)\). This is the area under the normal curve above Leo’s score. We use pnorm
:
1 - pnorm(4948, mean = 4313, sd = 583)
## [1] 0.1380342
So, Leo finished faster than 13.8% of the men in his age group. e. In symbols, we want to know: \(P(F \geq 5513)\). This is the area under the normal curve above Leo’s score. We use pnorm
:
1 - pnorm(5513, mean = 5261, sd = 807)
## [1] 0.3774186
So, Mary finished faster than 37.7% of the women in her group. f. Partially. The \(z\)-scores are still the same, but the probabilities that we computed in parts d and e would change if the distribution were not normal.
In Exercise 3.4 we saw two distributions for triathlon times: men’s finishing times, \(M \sim N(4313, 583)\), and women’s finishing times. \(W \sim N(5261, 807)\). Times are listed in seconds. Use this information to compute each of the following:
qnorm(0.05, mean = 4313, sd = 583)
## [1] 3354.05
qnorm(.9, mean = 5621, sd = 807)
## [1] 6655.212
These are prepresented in picture form below:
The Capital Asset Pricing Model (CAPM) is a financial model that assumes returns on a portfolio are normally distributed. Suppose a portfolio has an average annual return of 14.7% (i.e. an average gain of 14.7%) with a standard deviation of 33%. A return of 0% means the value of the portfolio doesn’t change, a negative return means that the portfolio loses money, and a positive return means that the portfolio gains money.
Let \(R\) represent the return. We assume \(R \sim N(14.7, 33)\)
pnorm(0, mean = 14.7, sd = 33)
## [1] 0.3279957
So, in about 32.8% of years, this portfolio will have negative return.
qnorm(0.85, mean = 14.7, sd = 33)
## [1] 48.9023
So 15% of the returns will be 48.9% or higher.
SAT scores (out of 2400) are distributed normally with a mean of 1500 and a standard deviation of 300. Suppose a school council awards a certificate of excellence to all students who score at least 1900 on the SAT, and suppose we pick one of the recognized students at random. What is the probability this student’s score will be at least 2100? (The material covered in Section 2.2 would be useful for this question.)
Let \(S\) be a student’s SAT score. \(S \sim N(1500, 300)\) We want the probability that a student’s score is greater than or equal to 2100 given that we know the student’s score is greater than or equal to 1900. In symbols, we want: \[P(S \geq 2100 | S \geq 1900) = \frac{P(S \geq 2100 \cap S \geq 1900)}{P(S \geq 1900)} = \frac{P(S \geq 2100)}{P(S \geq 1900)} = \frac{0.0228}{0.0912} = 0.2494\]
# P(S >= 2100):
num <- 1-pnorm(2100, 1500, 300)
num
## [1] 0.02275013
# P(S >= 1900):
den <- 1-pnorm(1900, 1500, 300)
den
## [1] 0.09121122
# P(S >=2100 |S >= 1900)
num/den
## [1] 0.2494225
So, if we randomly selecte a recognized student, there is about a 24.94% chance that they scored more than 2100.
In a picture, we want the ratio of the purple area to the blue + purple area.
The National Vaccine Information Center estimates that 90% of Americans have had chickenpox by the time they reach adulthood.
R
:dbinom(97, 100, 0.90)
## [1] 0.005891602
R
:dbinom(0, 10, .9)
## [1] 1e-10
1 - dbinom(0, 10, .9)
## [1] 1
R
(two solutions):# compute each probability with dbinom and then sum them up
sum(dbinom(7:10, 10, .9))
## [1] 0.9872048
# compute the cumulative probability with pbinom and subtract from 1
1-pbinom(6, 10, .9)
## [1] 0.9872048
We learned in Exercise 3.26 that about 90% of American adults had chickenpox before adulthood. We now consider a random sample of 120 American adults.
R
:pbinom(105, 120, .9)
## [1] 0.2181634
This value is the probability of any value of \(C_{120}\) less than or equal to 105 occurring, while part b just considered the probability of \(C_{120}\) equalling 105. Since this probability is about 22%, it’s not very unusual to see something slightly less than 105 either.
Here is a picture of the full probability distribution of \(C_{120}\). Notice how quikly it drops off below 100 and starts to look like the normal distribution:
A 2005 Gallup Poll found that 7% of teenagers (ages 13 to 17) suffer from arachnophobia and are extremely afraid of spiders. At a summer camp there are 10 teenagers sleeping in each tent. Assume that these 10 teenagers are independent of each other.
Let \(A_{10}\) be the number of teenagers out of 10 who suffer from arachnophobia.
R
:1-dbinom(0, 10, .07)
## [1] 0.5160177
R
:dbinom(2, 10, .07)
## [1] 0.1233878
R
:pbinom(1, 10, .07)
## [1] 0.8482701
A very skilled court stenographer makes one typographical error (typo) per hour on average.
R
:dpois(4, 1)
## [1] 0.01532831
There is only a 1.5% chance of 4 typos in 1 hour so it would be considered unusual. Also 4 is 3 standard deviations away from the mean of 1, so it is very unusual. d. We want \(P(T \leq 2)\). Using the Poisson distrubution formula: \[P(T \leq 2) = \sum_{t = 0}^2 \frac{1^t \cdot e^1}{t!} = 0.9197\] Using R
ppois(2, 1)
## [1] 0.9196986