If our outcome is a binary variable such as CaseControl status we need a logistic regression model. The command for this is the glm()
function which stands for generalised linear model. We also need to include the family="binomial"
argument. Again we use the summary()
function to extract the relevant statistics
model<-glm(dat$CaseControl ~ dat$Test1, family = "binomial")
summary(model)
##
## Call:
## glm(formula = dat$CaseControl ~ dat$Test1, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4209 -1.0276 -0.8209 1.0798 1.5822
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0987 0.8276 -1.328 0.184
## dat$Test1 0.1839 0.1390 1.323 0.186
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41.455 on 29 degrees of freedom
## Residual deviance: 39.612 on 28 degrees of freedom
## AIC: 43.612
##
## Number of Fisher Scoring iterations: 4
The output takes a very similar format to the lm()
output. What differences can you notice?
Instead of residuals we have deviance residuals. Instead of t-statistics we have z-statistics
Remember that in the logit model the response variable is log odds: ln(odds) = ln(p/(1-p)) = ax1 + bx2 + . + z*xn. Therefore, the logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable.
We can covert these to odds ratios as follows:
exp(coef(model))
## (Intercept) dat$Test1
## 0.3333151 1.2018885
exp(cbind(OR = coef(model), confint(model)))
## Waiting for profiling to be done...
## OR 2.5 % 97.5 %
## (Intercept) 0.3333151 0.05735656 1.585025
## dat$Test1 1.2018885 0.92311488 1.608472