OSYM Data Tidy Analysis Exercise (Individual)

load("/cloud/project/osym_data_2017_v2.RData")
osym_data_2017 <- osym_data_2017 %>% mutate(general_quota = as.numeric(general_quota), general_placement=as.numeric(general_placement))
osym_data_2017
## # A tibble: 11,465 x 14
##    program_id university_name  city  faculty_name  program_name  exam_type
##    <chr>      <chr>            <chr> <chr>         <chr>         <chr>    
##  1 100110266  ABANT İZZET BAY… BOLU  Bolu Sağlık … Hemşirelik    YGS_2    
##  2 100110487  ABANT İZZET BAY… BOLU  Bolu Turizm … Gastronomi v… YGS_4    
##  3 100110724  ABANT İZZET BAY… BOLU  Bolu Turizm … Turizm İşlet… YGS_6    
##  4 100130252  ABANT İZZET BAY… BOLU  Bolu Turizm … Turizm İşlet… YGS_6    
##  5 100110433  ABANT İZZET BAY… BOLU  Diş Hekimliğ… Diş Hekimliği MF_3     
##  6 100110609  ABANT İZZET BAY… BOLU  Diş Hekimliğ… Diş Hekimliğ… MF_3     
##  7 100110018  ABANT İZZET BAY… BOLU  Eğitim Fakül… Bilgisayar v… MF_1     
##  8 100110027  ABANT İZZET BAY… BOLU  Eğitim Fakül… Fen Bilgisi … MF_2     
##  9 100110036  ABANT İZZET BAY… BOLU  Eğitim Fakül… İlköğretim M… MF_1     
## 10 100110045  ABANT İZZET BAY… BOLU  Eğitim Fakül… İngilizce Öğ… DİL_1    
## # ... with 11,455 more rows, and 8 more variables: general_quota <dbl>,
## #   general_placement <dbl>, min_score <dbl>, max_score <dbl>,
## #   val_quota <dbl>, val_placement <dbl>, val_min_score <dbl>,
## #   val_max_score <dbl>
  1. Get the highest max_score programs from each exam_type.
osym_data_2017 %>%
  group_by(exam_type) %>%
  summarise(maxscore = max(max_score))
## # A tibble: 19 x 2
##    exam_type maxscore
##    <chr>        <dbl>
##  1 DİL_1         543.
##  2 DİL_2         485.
##  3 DİL_3         490.
##  4 MF            408.
##  5 MF_1          537.
##  6 MF_2          527.
##  7 MF_3          564.
##  8 MF_4          575.
##  9 TM_1          561.
## 10 TM_2          458.
## 11 TM_3          558.
## 12 TS_1          527.
## 13 TS_2          536.
## 14 YGS_1         516.
## 15 YGS_2         473.
## 16 YGS_3         315.
## 17 YGS_4         466.
## 18 YGS_5         419.
## 19 YGS_6         519.
  1. Plot the top 10 programs of İSTANBUL ÜNİVERSİTESİ in terms of total quota in a bar chart.
top_10_iu <- osym_data_2017 %>%
  filter(university_name == "İSTANBUL ÜNİVERSİTESİ") %>%
  arrange(desc(general_quota)) %>%
  slice(1:10)
top_10_iu
## # A tibble: 10 x 14
##    program_id university_name  city   faculty_name  program_name exam_type
##    <chr>      <chr>            <chr>  <chr>         <chr>        <chr>    
##  1 105611289  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… Sosyoloji (… TM_3     
##  2 105611298  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… İşletme (Aç… TM_1     
##  3 105611183  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… Coğrafya (A… TS_1     
##  4 105611305  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… Tarih (Açık… TS_2     
##  5 105611314  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… İktisat (Aç… TM_1     
##  6 105610501  İSTANBUL ÜNİVER… İSTAN… Hukuk Fakült… Hukuk        TM_3     
##  7 105611174  İSTANBUL ÜNİVER… İSTAN… Açık ve Uzak… Felsefe (Aç… TM_3     
##  8 105630092  İSTANBUL ÜNİVER… İSTAN… Hukuk Fakült… Hukuk (İÖ)   TM_3     
##  9 105610634  İSTANBUL ÜNİVER… İSTAN… İstanbul Tıp… İstanbul Tıp MF_3     
## 10 105610016  İSTANBUL ÜNİVER… İSTAN… Cerrahpaşa T… Cerrahpaşa … MF_3     
## # ... with 8 more variables: general_quota <dbl>, general_placement <dbl>,
## #   min_score <dbl>, max_score <dbl>, val_quota <dbl>,
## #   val_placement <dbl>, val_min_score <dbl>, val_max_score <dbl>
library(ggplot2)

ggplot(data = top_10_iu) + 
  geom_bar(aes(x=reorder(program_name,-general_quota), y=general_quota), stat = "identity") +
  labs(x="Program Name", y="Quota") + theme(axis.text.x=element_text(angle=90))

  1. Calculate the fill rate (sum(general_placement)/sum(general_quota)) per city and return the top 10 cities.
osym_data_2017 %>%
  group_by(city) %>%
  summarise(fillrate = sum(general_placement)/sum(general_quota)) %>%
  arrange(desc(fillrate)) %>%
  slice(1:10)
## # A tibble: 10 x 2
##    city                  fillrate
##    <chr>                    <dbl>
##  1 GEBZE                    1.02 
##  2 TEKİRDAĞ                 1.01 
##  3 MANİSA                   1.01 
##  4 KOCAELİ                  1.00 
##  5 BURSA                    1.00 
##  6 SUMGAYIT - AZERBAYCAN    1.00 
##  7 DİYARBAKIR               0.999
##  8 DENİZLİ                  0.998
##  9 EDİRNE                   0.998
## 10 SAKARYA                  0.997
  1. Find full (general_placement == general_quota) Endüstri Mühendisliği programs (use grepl) and draw a scatterplot of min_score vs max_score. Set transparency parameter (alpha) to 0.7. Set program colors according to whether it is a foundation university or state university. (Tip: State university programs ids start with 1, foundation 2, KKTC 3 and other abroad 4. You can use substr function.).
endustri_muhendisligi <- filter(osym_data_2017, grepl("Endüstri Mühendisliği", program_name))
full_endustri_muhendisligi <- endustri_muhendisligi %>% filter(general_quota==general_placement)
full_endustri_muhendisligi
## # A tibble: 119 x 14
##    program_id university_name  city  faculty_name  program_name  exam_type
##    <chr>      <chr>            <chr> <chr>         <chr>         <chr>    
##  1 106510023  ABDULLAH GÜL ÜN… KAYS… Mühendislik … Endüstri Müh… MF_4     
##  2 202910136  ALTINBAŞ ÜNİVER… İSTA… Mühendislik … Endüstri Müh… MF_4     
##  3 202910366  ALTINBAŞ ÜNİVER… İSTA… Mühendislik … Endüstri Müh… MF_4     
##  4 101011032  ANADOLU ÜNİVERS… ESKİ… Mühendislik … Endüstri Müh… MF_4     
##  5 206410262  ANTALYA BİLİM Ü… ANTA… Mühendislik … Endüstri Müh… MF_4     
##  6 200210997  ATILIM ÜNİVERSİ… ANKA… Mühendislik … Endüstri Müh… MF_4     
##  7 200210413  ATILIM ÜNİVERSİ… ANKA… Mühendislik … Endüstri Müh… MF_4     
##  8 200210404  ATILIM ÜNİVERSİ… ANKA… Mühendislik … Endüstri Müh… MF_4     
##  9 200510701  BAHÇEŞEHİR ÜNİV… İSTA… Mühendislik … Endüstri Müh… MF_4     
## 10 200510507  BAHÇEŞEHİR ÜNİV… İSTA… Mühendislik … Endüstri Müh… MF_4     
## # ... with 109 more rows, and 8 more variables: general_quota <dbl>,
## #   general_placement <dbl>, min_score <dbl>, max_score <dbl>,
## #   val_quota <dbl>, val_placement <dbl>, val_min_score <dbl>,
## #   val_max_score <dbl>
ggplot(data = full_endustri_muhendisligi, aes(x= min_score, y= max_score)) + geom_point(col="blue", alpha=0.7) + labs(x="Minimum score", y="Maximum score")

  1. Find the top 10 faculties with the highest quotas and draw a bar chart. Ignore similar names and typos in faculty names.
top_10_faculty <- osym_data_2017 %>%
  arrange(desc(general_quota)) %>%
  slice(1:10)
ggplot(data = top_10_faculty) + geom_bar(aes(x=reorder(program_name,-general_quota), y=general_quota), stat = "identity") +
  labs(x="Program Name", y="Quota") + theme(axis.text.x=element_text(angle=90))

  1. Find all full medicine programs (Tıp but not Tıp Mühendisliği) of foundation universities group by university calculate total quotas per university and maximum max_score and minimum min_score as bounds, ordered and colored by total quota. (Tip: Use geom_crossbar)

  2. Freestyle: Do an analysis that compares the mechanical engineering (Makine Mühendisliği) and civil engineering (İnşaat Mühendisliği) programs.

Top 5 universities for these programs were compared according to minimum acceptance score. As a result, it seems that the most succesfull students prefer mechanical engineering over civil engineering.

top_5avg_mech <- filter(osym_data_2017, general_quota>20, grepl("Makine Mühendisliği", program_name)) %>%
  arrange(desc(min_score)) %>%
  slice(1:5) %>%
  summarise("Average min scores" = mean(min_score))
top_5avg_mech
## # A tibble: 1 x 1
##   `Average min scores`
##                  <dbl>
## 1                 483.
top_5avg_civil <- filter(osym_data_2017, general_quota>20, grepl("İnşaat Mühendisliği", program_name)) %>%
  arrange(desc(min_score)) %>%
  slice(1:5) %>%
  summarise("Average min scores" = mean(min_score))
top_5avg_civil
## # A tibble: 1 x 1
##   `Average min scores`
##                  <dbl>
## 1                 452.
  1. Freestyle: Compare Koç University with Bilkent University.

Koç University has higher minimum acceptance score in all exam types than the Bilkent University.

koc <- osym_data_2017 %>%
  filter(grepl("KOÇ ÜNİVERSİTESİ", university_name)) %>%
  group_by(exam_type) %>% summarise(min_score = mean(min_score))
koc
## # A tibble: 9 x 2
##   exam_type min_score
##   <chr>         <dbl>
## 1 DİL_1          462.
## 2 MF_1           451.
## 3 MF_2           487.
## 4 MF_3           457.
## 5 MF_4           462.
## 6 TM_1           440.
## 7 TM_3           447.
## 8 TS_1           461.
## 9 TS_2           443.
bilkent <- osym_data_2017 %>%
  filter(grepl("BİLKENT ÜNİVERSİTESİ", university_name)) %>%
  group_by(exam_type) %>% summarise(min_score = mean(min_score))
bilkent
## # A tibble: 11 x 2
##    exam_type min_score
##    <chr>         <dbl>
##  1 DİL_1          440.
##  2 MF_1           385.
##  3 MF_2           378.
##  4 MF_3           418.
##  5 MF_4           446.
##  6 TM_1           368.
##  7 TM_2           327.
##  8 TM_3           397.
##  9 TS_1           362.
## 10 YGS_1          361.
## 11 YGS_6          295.
  1. Freestyle: Do your best.