Course Notes Home

Logistic Regression:

If our outcome is a binary variable such as CaseControl status we need a logistic regression model. The command for this is the glm() function which stands for generalised linear model. We also need to include the family="binomial" argument. Again we use the summary() function to extract the relevant statistics

model<-glm(dat$CaseControl ~ dat$Test1, family = "binomial")
summary(model)
## 
## Call:
## glm(formula = dat$CaseControl ~ dat$Test1, family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4209  -1.0276  -0.8209   1.0798   1.5822  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -1.0987     0.8276  -1.328    0.184
## dat$Test1     0.1839     0.1390   1.323    0.186
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.455  on 29  degrees of freedom
## Residual deviance: 39.612  on 28  degrees of freedom
## AIC: 43.612
## 
## Number of Fisher Scoring iterations: 4

The output takes a very similar format to the lm() output. What differences can you notice?

Instead of residuals we have deviance residuals. Instead of t-statistics we have z-statistics

Remember that in the logit model the response variable is log odds: ln(odds) = ln(p/(1-p)) = ax1 + bx2 + . + z*xn. Therefore, the logistic regression coefficients give the change in the log odds of the outcome for a one unit increase in the predictor variable.

We can covert these to odds ratios as follows:

exp(coef(model))
## (Intercept)   dat$Test1 
##   0.3333151   1.2018885
exp(cbind(OR = coef(model), confint(model)))
## Waiting for profiling to be done...
##                    OR      2.5 %   97.5 %
## (Intercept) 0.3333151 0.05735656 1.585025
## dat$Test1   1.2018885 0.92311488 1.608472