Course

Information Retrieval and Text Mining (DAT640)

The course offers an introduction to techniques and methods for processing, mining, and searching in massive text collections. The course considers a broad variety of applications and provides an opportunity for hands-on experimentation with state-of-the-art algorithms using existing software tools and data collections.


Course description for study year 2020-2021. Please note that changes may occur.

See course description and exam/assesment information for this semester (2024-2025)

Semesters

Facts

Course code

DAT640

Credits (ECTS)

10

Semester tution start

Autumn

Language of instruction

English

Number of semesters

1

Exam semester

Autumn

Content

  • Search engine architecture
  • Text preprocessing and indexing
  • Retrieval models (vector-space model, probabilistic models, learning to rank, neural models)
  • Search engine evaluation
  • Query modeling, relevance feedback
  • Web search (crawling, indexing, link analysis)
  • Semantic search (knowledge bases, entity retrieval, entity linking)
  • Text clustering
  • Text categorization
  • Topic analysis 
  • Opinion mining and sentiment analysis

Learning outcome

Knowledge:

  • Theory and practice of concepts, methods, and techniques for managing and analyzing large amounts of text data.

Skills:

  • Process and prepare large-scale textual data collections for retrieval and mining.
  • Apply retrieval, classification, and clustering methods to a range of information access problems.
  • Conduct performance evaluation and error analysis.

General competencies:

  • Understanding of the strengths and limitations of modern information retrieval and text mining techniques. Being able to identify promising business applications, participate in and lead such projects.

Required prerequisite knowledge

None

Exam

Form of assessment Weight Duration Marks Aid Exam system Withdrawal deadline Exam date
Home exam 3/5 4 Hours Letter grades Inspera assessment 17.11.2020
Project work 2/5 Letter grades 25.08.2020


The project is carried out individually or in groups of 2 or 3. The project is carried out in the groups set up by the course instructor. If a student fails the project, she/he has to take this part next time the subject is lectured.

Permitted aid: all written and printed material, and basic calculator

Course teacher(s)

Head of Department:

Tom Ryen

Course coordinator:

Krisztian Balog

Course teacher:

Krisztian Balog

Course teacher:

Petra Galuscakova

Method of work

6 hours of lectures/lab exercises each week.

Overlapping courses

Course Reduction (SP)
Web Search and Data Mining (DAT630_1) , Information Retrieval and Text Mining (DAT640_1) 5

Open for

Admission to Single Courses at the Faculty of Science and Technology
Data Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme Industrial Automation and Signal Processing - Master's Degree Programme - 5 year
Exchange programme at Faculty of Science and Technology

Course assessment

Form and/or discussion.

Literature

Litteratur1. Zhai C. Text data management and analysis : a practical introduction to information retrieval and text mining. Bd no. 12. (Massung S, red.). Morgan & Claypool; 2016.2. Balog K. Entity-Oriented Search [electronic resource] . Bd 39. 1st ed. 2018. Imprint Springer; Springer International Publishing 2018:1 online resource (XIX, 351 p. 86 illus., 13 illus. in color.).
The course description is retrieved from FS (Felles studentsystem). Version 1