COURSE UNIT TITLE

: INTRODUCTION TO TEXT AND WEB MINING

Description of Individual Course Units

Course Unit Code Course Unit Title Type Of Course D U L ECTS
BIL 3102 INTRODUCTION TO TEXT AND WEB MINING ELECTIVE 3 0 0 5

Offered By

Computer Science

Level of Course Unit

First Cycle Programmes (Bachelor's Degree)

Course Coordinator

Offered to

Computer Science

Course Objective

This course aims to give information about queries and documents, document pre-processing, the word distribution, patch assessment, automatic indexing / tagging, character matching, query expansion, random graph models, social network analysis, graph-based methods, semi-supervised text correction, spamming and anti-spamming techniques, text summarization, natural language processing, classifying web pages, extracting knowledge from the web.

Learning Outcomes of the Course Unit

1   Have general information about text mining techniques,
2   Have general information about web mining techniques,
3   Be capable of analysing text based documents,
4   Have general information about natural language processing techniques,
5   Have information about web search and indexing.

Mode of Delivery

Face -to- Face

Prerequisites and Co-requisites

None

Recomended Optional Programme Components

None

Course Contents

Week Subject Description
1 Introduction to text mining, Boolean retrieval
2 Dictionaries
3 Indexes construction, compression
4 Scoring, term weighting
5 Computing scores
6 Information retrieval
7 XML retrieval
8 Midterm Exam
9 Language models
10 Text classification
11 Vector space classification
12 Support vector machines, Machine learning on documents
13 Flat and hierarchical clustering
14 Web search basics, web crawling and indexes Link analysis

Recomended or Required Reading

Textbook(s):
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press, 2009.
Supplementary Book(s):
Song, M., Handbook of Research on Text and Web Mining Technologies, Volume I-II, Y-F. B. Wu, 2007.
Jurafksy, D., Martin, J. H.., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd ed., Stanford University, 2022.

Planned Learning Activities and Teaching Methods

The course is taught in a lecture, class presentation and discussion format. Besides the taught lecture, group presentations are to be prepared by the groups assigned and presented in a discussion session. In some weeks of the course, results of the homework given previously are discussed.

Assessment Methods

SORTING NUMBER SHORT CODE LONG CODE FORMULA
1 MTE MIDTERM EXAM
2 ASG ASSIGNMENT
3 FIN FINAL EXAM
4 FCG FINAL COURSE GRADE MTE * 0.30 + ASG * 0.30 + FIN * 0.40
5 RST RESIT
6 FCGR FINAL COURSE GRADE (RESIT) MTE * 0.30 + ASG * 0.30 + RST * 0.40


*** Resit Exam is Not Administered in Institutions Where Resit is not Applicable.

Further Notes About Assessment Methods

None

Assessment Criteria

Assignment: 30%
Midterm exam: 30%
Final exam: 40%

Assignment and Final exam are evaluated as programming projects

Language of Instruction

Turkish

Course Policies and Rules

Students will come to the class in time. Attending the 70% of the classes are mandotary.

Contact Details for the Lecturer(s)

mete.eminagaoglu@deu.edu.tr

Office Hours

Will be announced.

Work Placement(s)

None

Workload Calculation

Activities Number Time (hours) Total Work Load (hours)
Lectures 13 3 39
Preparations before/after weekly lectures 12 1 12
Preparing assignments 1 15 15
Preparation for final exam 1 30 30
Preparation for midterm exam 1 15 15
Final 1 2 2
Midterm 1 2 2
TOTAL WORKLOAD (hours) 115

Contribution of Learning Outcomes to Programme Outcomes

PO/LOPO.1PO.2PO.3PO.4PO.5PO.6PO.7PO.8PO.9PO.10PO.11PO.12PO.13
LO.13
LO.23
LO.33444
LO.43444
LO.543544