Course

Information Retrieval and Text Mining (DAT640)

The course offers an introduction to techniques and methods for processing, mining, and searching in massive text collections. The course considers a broad variety of applications and provides an opportunity for hands-on experimentation with state-of-the-art algorithms using existing software tools and data collections.


Dette er emnebeskrivelsen for studieåret 2022-2023. Merk at det kan komme endringer.

See course description and exam/assesment information for this semester (2024-2025)

Semesters

Fakta

Emnekode

DAT640

Vekting (stp)

10

Semester undervisningsstart

Autumn

Undervisningsspråk

English

Antall semestre

1

Vurderingssemester

Autumn

Content

  • Search engine architecture
  • Text preprocessing, indexing, representation learning
  • Retrieval models (vector-space model, probabilistic models, learning to rank, neural models)
  • Search engine evaluation
  • Query modeling, relevance feedback
  • Web search (crawling, indexing, link analysis)
  • Semantic search (knowledge bases, entity retrieval, entity linking)
  • Text clustering
  • Text categorization

Learning outcome

Knowledge:

  • Theory and practice of concepts, methods, and techniques for managing and analyzing large amounts of text data.

Skills:

  • Process and prepare large-scale textual data collections for retrieval and mining.
  • Apply retrieval, classification, and clustering methods to a range of information access problems.
  • Conduct performance evaluation and error analysis.

General competencies:

  • Understanding of the strengths and limitations of modern information retrieval and text mining techniques. Being able to identify promising business applications, participate in and lead such projects.

Forkunnskapskrav

Ingen

Exam

Form of assessment Weight Duration Marks Aid Exam system Withdrawal deadline Exam date
Project work 2/5 Letter grades
Written exam 3/5 4 Hours Letter grades All written and printed means are allowed. Definite, basic calculator allowed, All aids are permitted - it is not permitted to collaborate / get help from other people in working with the exam task Inspera assessment 07.11.2022 21.11.2022


The project is a combination of individual and group assignments. The project groups are set up by the course instructor. 

There is no re-sit option on the project. If a student fails the project, they have to take this part next time the subject is lectured.

All assessment parts must be passed in order to achieve an overall grade in the course.

Fagperson(er)

Head of Department:

Tom Ryen

Course coordinator:

Krisztian Balog

Course teacher:

Krisztian Balog

Course teacher:

Petra Galuscakova

Method of work

6 hours of lectures/lab exercises each week.

Overlapping

Emne Reduksjon (SP)
Web Search and Data Mining (DAT630_1) , Information Retrieval and Text Mining (DAT640_1) 5

Åpent for

Admission to Single Courses at the Faculty of Science and Technology
Data Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme Industrial Automation and Signal Processing - Master's Degree Programme - 5 year
Exchange programme at Faculty of Science and Technology

Emneevaluering

Form and/or discussion.

Litteratur

Book Text data management and analysis : a practical introduction to information retrieval and text mining Zhai, ChengXiang, Massung, Sean, [San Rafael, Calif.], Morgan & Claypool, XX, 510 s., no. 12, cop. 2016, isbn:9781970001167; 9781970001198, E-book Entity-Oriented Search [electronic resource] Balog, Krisztian., Cham :, Imprint Springer; Springer International Publishing , 1 online resource (XIX, 351 p. 86 illus., 13 illus. in color.), 39, 2018., isbn:3-319-93935-1, Chapters 1--5 https://bibsys-ur.userservices.exlibrisgroup.com/view/uresolver/47BIBSYS_UBIS/openurl?ctx_enc=info:ofi/enc:UTF-8&ctx_id=10_1&ctx_tim=2020-05-19T11%3A09%3A24IST&ctx_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&url_ver=Z39.88-2004&rfr_id=info:sid/primo.exlibrisgroup.com-BIBSYS_ILS&req_id=&rft_dat=ie=47BIBSYS_DIAKON:5141636900002247,ie=47BIBSYS_UBIS:5176639730002208,ie=47BIBSYS_UBB:51164968700002207,ie=47BIBSYS_UBA:5175885660002209,ie=47BIBSYS_UBTO:51132887690002205,ie=47BIBSYS_UBO:51219580670002204,ie=47BIBSYS_SSHF:5123146620002269,ie=47BIBSYS_NIH:5125837550002238,ie=47BIBSYS_UBIN:5192604960002211,ie=47BIBSYS_NTNU_UB:51245894430002203,ie=47BIBSYS_NMBU:5137555320002213,ie=47BIBSYS_MF:5142753320002227,ie=47BIBSYS_LOVISHS:5124112550002272,ie=47BIBSYS_HIB:5159567020002221,ie=47BIBSYS_HIO:5180303690002218,ie=47BIBSYS_HIT:5168700090002210,ie=47BIBSYS_HIOA:5180910740002212,ie=47BIBSYS_FFI_BIBL:5119268700002246,ie=47BIBSYS_SIRUS:5133591110002256,ie=47BIBSYS_DMMH:5125896880002262,ie=47BIBSYS_AHUS:5132459740002263,ie=47BIBSYS_NETWORK:71568821940002201,language=eng,view=UBIS&svc_dat=viewit&u.ignore_date_coverage=true&user_ip=10.16.56.57&req.skin=primoView online
The course description is retrieved from FS (Felles studentsystem). Version 1