Esoph Data

First of all, relation between each parameter and cancer occurance was analyzed. As a result, all parameters increase the cancer risk.

esoph %>% 
  group_by(agegp) %>%
  summarise(per_occur=sum(ncases)*100/sum(ncontrols)) %>%
  ggplot(data = .,aes(agegp, per_occur)) + geom_bar(aes(fill=desc(per_occur)), stat = "identity")+
  labs(x="Age", y="Cancer Observed (%)") + theme_bw() +theme(legend.position = "none")

esoph %>% 
  group_by(alcgp) %>%
  summarise(per_occur=sum(ncases)*100/sum(ncontrols)) %>%
  ggplot(data = .,aes(alcgp, per_occur)) + geom_bar(aes(fill=desc(per_occur)), stat = "identity")+
  labs(x="Alcohol Consumption (g/day)", y="Cancer Observed (%)") + theme_bw() +theme(legend.position = "none")

esoph %>% 
  group_by(tobgp) %>%
  summarise(per_occur=sum(ncases)*100/sum(ncontrols)) %>%
  ggplot(data = .,aes(tobgp, per_occur)) + geom_bar(aes(fill=desc(per_occur)), stat = "identity")+
  labs(x="Tobacco Consumption (g/day)", y="Cancer Observed (%)") + theme_bw() +theme(legend.position = "none")

According to age analysis, the risk of cancer almost begins at age 45. So, both alcohol and tobacco consumption were analyzed for the ages above 45. It seems that, after 80 g/day alcohol consumption, the cancer occurance is almost guaranteed irrelevant from the tobacco consumption.

esoph %>%
  filter(agegp!="25-34" & agegp != "35-44") %>%
  group_by(agegp,alcgp,tobgp) %>%
  summarise(per_occur=sum(ncases)*100/sum(ncontrols)) %>%
  ggplot(data = .,aes(alcgp, per_occur)) + geom_bar(aes(fill=tobgp), stat = "identity",position = "dodge") +
  scale_fill_brewer(palette ="Reds") +
  labs(x="Alcohol Consumption", y="Cancer Observed (%)") + guides(fill=guide_legend(title="Tobacco Consumption")) + 
  theme_bw()