Step 1 - Data View

First check the data inside esoph with head. I like to see it with View function actually, however it does not shown on the html output.

head(esoph)
##   agegp     alcgp    tobgp ncases ncontrols
## 1 25-34 0-39g/day 0-9g/day      0        40
## 2 25-34 0-39g/day    10-19      0        10
## 3 25-34 0-39g/day    20-29      0         6
## 4 25-34 0-39g/day      30+      0         5
## 5 25-34     40-79 0-9g/day      0        27
## 6 25-34     40-79    10-19      0         7

Step 2 - Seperated Effect of Factors

cancer_tobacco_alcohol_sep <- glm(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp) + unclass(alcgp), data = esoph, family = binomial())
anova(cancer_tobacco_alcohol_sep)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(ncases, ncontrols)
## 
## Terms added sequentially (first to last)
## 
## 
##                Df Deviance Resid. Df Resid. Dev
## NULL                              87    227.241
## agegp           5   88.128        82    139.112
## unclass(tobgp)  1   17.522        81    121.591
## unclass(alcgp)  1   62.314        80     59.277

By loooking at DF, we see that most important factor is Age. After that Alcohol usage and tobacco usage comes together.

Step 2 - Combined Effect of Factors

cancer_tobacco_alcohol_comb <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp, data = esoph, family = binomial())
anova(cancer_tobacco_alcohol_comb)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(ncases, ncontrols)
## 
## Terms added sequentially (first to last)
## 
## 
##             Df Deviance Resid. Df Resid. Dev
## NULL                           87    227.241
## agegp        5   88.128        82    139.112
## tobgp        3   19.085        79    120.028
## alcgp        3   66.054        76     53.973
## tobgp:alcgp  9    6.489        67     47.484

On this analyse, we see that most important factor is using alcohol and tobacco at the same time. Age factor comes after using them together.