COURSE UNIT TITLE

: INTRODUCTION TO TEXT AND WEB MINING

Description of Individual Course Units

Course Unit Code Course Unit Title Type Of Course D U L ECTS
BIL 3102 INTRODUCTION TO TEXT AND WEB MINING ELECTIVE 3 0 0 5

Offered By

Computer Science

Level of Course Unit

First Cycle Programmes (Bachelor's Degree)

Course Coordinator

PROFESSOR DOCTOR EFENDI NASIBOĞLU

Offered to

Computer Science

Course Objective

In this course, Introduction to Text and Web Mining, queries and documents, document preprocessing, word distributions, vectorization, web scraping, sentence matching, social network analysis, natural language processing, deep learning-based models, large language models will be explained.

Learning Outcomes of the Course Unit

1   Have general information about text mining techniques,
2   Have general information about web mining techniques,
3   Be capable of analysing text based documents,
4   Have general information about natural language processing techniques,
5   Have knowledge about web scraping technologies

Mode of Delivery

Face -to- Face

Prerequisites and Co-requisites

None

Recomended Optional Programme Components

None

Course Contents

Week Subject Description
1 Introduction
2 Web scraping techniques with Python
3 Extracting information from social media
4 Text preprocessing techniques. Stemming, stop words, n-gram.
5 Scoring, term weighting
6 TF-IDF vector representation of texts
7 Intertext distance measures, Levenshtein, Jaro-Winkler
8 Intertext fuzzy similarity
9 Text classification
10 Deep learning techniques in vectorization of texts
11 Word2Vec, CBOW, SkipGram, Fasttext
12 Pretrained Transformers, LSTM, BiLSTM, LLM
13 Large Language Models (LLM), Prompt engineering, One shot, Few shot prompting.
14 Project presentations.

Recomended or Required Reading

Textbook(s):
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, An Introduction to Information Retrieval, Cambridge University Press, 2009.
Supplementary Book(s):
Song, M., Handbook of Research on Text and Web Mining Technologies, Volume I-II, Y-F. B. Wu, 2007.
Jurafksy, D., Martin, J. H.., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd ed., Stanford University, 2022.

Planned Learning Activities and Teaching Methods

The course is taught in a lecture, class presentation and discussion format. Besides the taught lecture, group presentations are to be prepared by the groups assigned and presented in a discussion session. In some weeks of the course, results of the homework given previously are discussed.

Assessment Methods

SORTING NUMBER SHORT CODE LONG CODE FORMULA
1 MTE MIDTERM EXAM
2 ASG ASSIGNMENT
3 FIN FINAL EXAM
4 FCG FINAL COURSE GRADE MTE * 0.30 + ASG * 0.30 + FIN * 0.40
5 RST RESIT
6 FCGR FINAL COURSE GRADE (RESIT) MTE * 0.30 + ASG * 0.30 + RST * 0.40


Further Notes About Assessment Methods

None

Assessment Criteria

Assignment: 30%
Midterm exam: 30%
Final exam: 40%

Language of Instruction

Turkish

Course Policies and Rules

Students will come to the class in time. Attending the 70% of the classes are mandotary.

Contact Details for the Lecturer(s)

efendi.nasibov@deu.edu.tr

Office Hours

Will be announced.

Work Placement(s)

None

Workload Calculation

Activities Number Time (hours) Total Work Load (hours)
Lectures 14 3 42
Preparations before/after weekly lectures 14 1 14
Preparing assignments 1 15 15
Preparation for final exam 1 30 30
Preparation for midterm exam 1 15 15
Final 1 2 2
Midterm 1 2 2
TOTAL WORKLOAD (hours) 120

Contribution of Learning Outcomes to Programme Outcomes

PO/LOPO.1PO.2PO.3PO.4PO.5PO.6PO.7PO.8PO.9PO.10PO.11PO.12PO.13
LO.13
LO.23
LO.33444
LO.43444
LO.543544