COURSE UNIT TITLE

: STATISTICAL METHODS IN MACHINE LEARNING

Description of Individual Course Units

Course Unit Code Course Unit Title Type Of Course D U L ECTS
STA 5112 STATISTICAL METHODS IN MACHINE LEARNING ELECTIVE 3 0 0 8

Offered By

Graduate School of Natural and Applied Sciences

Level of Course Unit

Second Cycle Programmes (Master's Degree)

Course Coordinator

ASSOCIATE PROFESSOR IDIL YAVUZ

Offered to

Statistics (English)
STATISTICS (ENGLISH)
Statistics (English)

Course Objective

With the rise of big, high dimensional and high frequency (occasionally streaming) data, machine learning methods have become inevitably necessary for prediction and classification purposes in many fields. This course aims to give an introduction to popular machine learning methods with a focus on the statistical concepts utilized underneath them. Keeping this goal in mind, mainly supervised statistical learning will be discussed but clustering will also be covered. The class will introduce predictive approaches (splines, additive models and GAM, regression trees, neural networks, k-nearest neighbor), classification tools (support vector machines and random forests) and clustering; while keeping a close eye on statistical concepts like parameter estimation, inference and model and variable selection.

Learning Outcomes of the Course Unit

1   Perform model diagnostics with the help of cross validation and bootstrap
2   Choose appropriate models and variables for the response at hand and estimate and infer about the required parameters using statistical methods
3   Implement splines, additive models and GAM, regression trees, NN and KNN for prediction
4   Implement support vector machines and random forests for classification
5   Perform clustering
6   Use R to apply the methods covered in class

Mode of Delivery

Face -to- Face

Prerequisites and Co-requisites

None

Recomended Optional Programme Components

None

Course Contents

Week Subject Description
1 Statistics versus ML: Large sample size, high dimension, multivariate responses
2 Data splitting, cross validation and bootstrap
3 Model and variable selection, bias-variance trade-off
4 Shrinkage methods
5 Prediction: K-Nearest Neighbor
6 Prediction: Splines
7 Prediction: Additive Models and GAM
8 Prediction: Regression Trees
9 Prediction: Neural Networks
10 Classification: Support Vector Machines
11 Classification: Random Forests
12 Clustering: Distance and similarity
13 Clustering: K-means
14 Clustering: Hierarchical methods

Recomended or Required Reading

Textbook(s):Data Analysis and Data Mining: An Introduction, Adelchi Azzalini and Bruno Scarpa, Oxford, 2012

An Introduction to Statistical Learning, with Application in R , G. James, D. Witten, T. Hastie and R. Tibshirani, Springer, 2013

Supplementary Book(s):The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, Jerome Friedman, Springer, 2001

Planned Learning Activities and Teaching Methods

Lecture, homework, presentation.

Assessment Methods

SORTING NUMBER SHORT CODE LONG CODE FORMULA
1 MTE MIDTERM EXAM
2 ASG ASSIGNMENT
3 FIN FINAL EXAM
4 FCG FINAL COURSE GRADE MTE *0.35 + ASG *0.25 +FIN *0.40
5 RST RESIT
6 FCGR FINAL COURSE GRADE (RESIT) MTE *0.35 + ASG *0.25 +RST *0.40


Further Notes About Assessment Methods

None

Assessment Criteria

Evaluation of homework assignments, presentation and exams.

Language of Instruction

English

Course Policies and Rules

Attendance to at least 70% for the lecturesis an essential requirement of this course and is the responsibility of the student. It is necessary that attendance to the lecture and homework delivery must be on time. Any unethical behavior that occurs either in presentations or in exams will be dealt with as outlined in school policy. You can find the graduate policy at http://www.fbe.deu.edu.tr.

Contact Details for the Lecturer(s)

e-mail: idil.yavuz@deu.edu.tr

Office Hours

To be announced.

Work Placement(s)

None

Workload Calculation

Activities Number Time (hours) Total Work Load (hours)
Lectures 14 3 42
Preparations before/after weekly lectures 14 3 42
Preparation for midterm exam 1 25 25
Preparation for final exam 1 30 30
Preparing assignments 2 25 50
Preparing presentations 1 15 15
Final 1 2 2
Midterm 1 2 2
TOTAL WORKLOAD (hours) 208

Contribution of Learning Outcomes to Programme Outcomes

PO/LOPO.1PO.2PO.3PO.4PO.5PO.6PO.7PO.8PO.9PO.10
LO.145
LO.24553533
LO.3555533
LO.4555533
LO.55543
LO.6533