Read data fro CSV file and get only rows that all columns are fulfilled
youngData <- read.csv("C:/Users/Eren/Desktop/data/data/youth_responses.csv",sep=",")
youngData <- youngData %>% filter(complete.cases(.)) %>% tbl_df()
I wanted to see how following interests related to each other, History,Psychology,Politics,Mathematics,Physics,Economy Management,Biology,Chemistry,Reading,Geography,Foreign languages,Medicine,Law
So that I selected only these columns (History to Law, except PC and Internet)
scienceInterest <- youngData %>% select(History:Law, -PC, -Internet)
Then applied multidimensional scaling with 2 dimensions to find out distance between interests.
scienceDistance <- 1 - cor(scienceInterest)
scienceMds <- cmdscale(scienceDistance,k=2)
colnames(scienceMds) <- c("x","y")
Plotted the graph with taking rownames as labels and colors. (Color here does not imply anything, it only makes graph beatiful)
ggplot(data.frame(scienceMds),aes(x=x,y=y)) + geom_text(aes(label=rownames(scienceMds),color=rownames(scienceMds)),angle=45,size=4)
Comments
As the result on the plot, we see that who likes Biology, mostly likes Medicine and Chemirstry also. Same close relation also occurs between Psychology and Foreign Languages