Here we will use a t-test to see if there is a significant difference in the ages of the males and females. The required function is t.test()
. In the first argument we specify what we want to test. We have also included an argument to specify whether we want a two tailed or one tailed test i.e. what the alternative hypothesis is.
t.test(dat$Age ~ dat$Sex, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: dat$Age by dat$Sex
## t = 0.74859, df = 27.977, p-value = 0.4604
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.967981 4.234648
## sample estimates:
## mean in group F mean in group M
## 25.60000 24.46667
The output of this function is a formatted summary of the test including many of the numbers that we may be interested in. While this is very informative and easy to interpret, if we wish to run lots of tests, perhaps within a loop, we need to be able to extract the parts of the output we wish to record. We can do this by saving the output as a variable and then extract the various components using the $
operator. Below we try out a few examples of this to extract the means of the two groups, the p-value and 95% confidence intervals.
tOut<-t.test(dat$Age ~ dat$Sex, alternative = "two.sided")
tOut$estimate
## mean in group F mean in group M
## 25.60000 24.46667
tOut$p.value
## [1] 0.4603517
tOut$conf.int
## [1] -1.967981 4.234648
## attr(,"conf.level")
## [1] 0.95
If we are only interested in a single value for example just the p-value, we can link the commands together and print just the p-value in one line of code.
t.test(dat$Age ~ dat$Sex, alternative = "two.sided")$p.value
## [1] 0.4603517
This is an example of a two-sample t-test, where we have used the formula style, indicated by ~
. Alternatively, we can provide two vectors which we wish to compare with a t-test as the first two arguments of the function, like this:
t.test(dat$Age[which(dat$Sex == "M")], dat$Age[which(dat$Sex == "F")], alternative = "two.sided")$p.value
## [1] 0.4603517
For a paired test you need to add the argument paired = TRUE
. The function assumes the matched elements occupy the same index in each vector.In other words the first element of the first vector is paired with the first element of the second vector and so forth. If the vectors are different lengths you will get an error.
For a one-sample test you just provide one vector of data values.
If you have equal variances you can include the argument var.equal = TRUE
. The default value here is FALSE.
This formula coding style can also be used in other functions such as a boxplot()
. A boxplot is another way to visualize the distribution.
boxplot(dat$Age ~ dat$Sex, ylab = "Age", col = c("pink", "blue"), border = "navy")
Alternatively we may want to use the non-parametric version a Mann-Whitney test. For this the function we require is wilcox.test()
. Like the t.test()
there are two ways to provide the data we want to test; additionally we specify the alternative hypothesis in the same way.
wilcox.test(dat$Age ~ dat$Sex, alternative = "two.sided")
## Warning in wilcox.test.default(x = c(30L, 21L, 27L, 26L, 23L, 32L, 28L, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: dat$Age by dat$Sex
## W = 130.5, p-value = 0.4666
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(dat$Age[which(dat$Sex == "M")], dat$Age[which(dat$Sex == "F")], alternative = "two.sided")
## Warning in wilcox.test.default(dat$Age[which(dat$Sex == "M")], dat
## $Age[which(dat$Sex == : cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: dat$Age[which(dat$Sex == "M")] and dat$Age[which(dat$Sex == "F")]
## W = 94.5, p-value = 0.4666
## alternative hypothesis: true location shift is not equal to 0
We can extract p-values in a similar manner to the t-test.
wOut<-wilcox.test(dat$Age ~ dat$Sex, alternative = "two.sided")
## Warning in wilcox.test.default(x = c(30L, 21L, 27L, 26L, 23L, 32L, 28L, :
## cannot compute exact p-value with ties
wOut$p.value
## [1] 0.4665793