Read data fro CSV file and get only rows that all columns are fulfilled

youngData <- read.csv("C:/Users/Eren/Desktop/data/data/youth_responses.csv",sep=",") 
youngData <- youngData %>%  filter(complete.cases(.)) %>% tbl_df()

I wanted to see how following interests related to each other, History,Psychology,Politics,Mathematics,Physics,Economy Management,Biology,Chemistry,Reading,Geography,Foreign languages,Medicine,Law

So that I selected only these columns (History to Law, except PC and Internet)

scienceInterest <- youngData %>% select(History:Law, -PC, -Internet)

Then applied multidimensional scaling with 2 dimensions to find out distance between interests.

scienceDistance <- 1 - cor(scienceInterest)
scienceMds <- cmdscale(scienceDistance,k=2)
colnames(scienceMds) <- c("x","y")

Plotted the graph with taking rownames as labels and colors. (Color here does not imply anything, it only makes graph beatiful)

ggplot(data.frame(scienceMds),aes(x=x,y=y)) + geom_text(aes(label=rownames(scienceMds),color=rownames(scienceMds)),angle=45,size=4)

Comments

As the result on the plot, we see that who likes Biology, mostly likes Medicine and Chemirstry also. Same close relation also occurs between Psychology and Foreign Languages

Young Survey Data

Eren Demir

May 14, 2018

Comments