Data-intensive Systems and Algorithms | University of Stavanger

Fakta

Emnekode

DAT535

Vekting (stp)

Semester undervisningsstart

Autumn

Undervisningsspråk

English

Antall semestre

Vurderingssemester

Autumn

Timeplan

Vis timeplan

Litteratur

Søk etter pensumlitteratur i Leganto

Content

The emergence of Big Data and Data-intensive Systems as specialized fields in computing has been motivating development of new techniques and technologies needed to extract knowledge from large datasets. Since Hadoop was conceived in 2005, popular interest in data-intensive systems began to grow. It resulted - over time - in a collection of technologies, methodologies, and practices to cover the complete data lifecycle.

This course is a first step to a variety of roles related to data-intensive systems. The core tasks in these roles that we will address are: roles in a data team, data acquisition and integration (using files, APIs, etc.), data cleaning and augumentation (often using direct implementation of MapReduce jobs), data analytics and ML (often using one of data processing frameworks e.g. SparkSQL, MLlib), advocating technology application both in technical and non-technical setting, providing introductory training to coworkers.

Learning outcome

Knowledge

Understanding of Medallion Architecture: Students will gain a comprehensive understanding of the Medallion Architecture, including its layers (bronze, silver, and gold) and how it supports data processing and analytics.
Apache Spark Fundamentals: Students will learn the core concepts of Apache Spark, including its architecture, components, and how it handles big data processing.
Data Management and Governance: Knowledge of data management principles, data governance, and best practices for ensuring data quality and integrity.
Big Data Ecosystem: Familiarity with the broader big data ecosystem, including tools and technologies that complement Apache Spark, such as Hadoop, Kafka, Delta Lake, NOSQL databases.

Skills

Data Processing and Transformation: Proficiency in using Apache Spark for data processing tasks, including batch and stream processing, data cleaning, and transformation.
Performance Tuning: Skills in optimizing Apache Spark jobs for performance, including resource management, partitioning, and tuning Spark configurations.
Data Integration: Competence in integrating data from various sources and formats into a unified data platform using Medallion Architecture principles.
Problem-Solving: Ability to troubleshoot and resolve issues related to data pipelines, data quality, and performance bottlenecks.

General qualifications:

Collaboration and Communication: Effective communication and collaboration skills to work with cross-functional teams implementing data-intensive solutions.
Ethical Considerations: Awareness of ethical considerations in data engineering, including data privacy, security, and responsible data usage.

Forkunnskapskrav

Python programming

Anbefalte forkunnskaper

Database Systems (DAT220), Operating Systems and Systems Programming (DAT320), Cloud Computing Technologies (DAT515)

Bash programming

Administration of Cloud and container-based environments

Databases, SQL

Exam

Form of assessment	Weight	Duration	Marks	Aid
Project	1/1	6 Weeks	Letter grades	All

Project is completed in groups. Project lasts for 6 weeks in addition to obligatory labs that give basis for the project.

No re-sit opportunities are offered for project assignments. Students who do not pass the project can retake it the next time the course is held.

Vilkår for å gå opp til eksamen/vurdering

Mandatory Assignments, Oral presentation

Three assignments

Students start with 3 mandatory assignments that contain programing and system administration. Assignments are to be completed individually. All mandatory assignments must be passed within deadline so that the student has the right to start with the project. The obligatory assignments give access to the project only in the current semester.

Completion of mandatory lab assignments is to be made at the times and in the groups that are assigned and published. Absence due to illness or for other reasons must be communicated as soon as possible to the laboratory personnel. One cannot expect that provisions for completion of the lab assignments at other times are made unless prior arrangements with the laboratory personnel have been agreed upon.

All group members must participate in the project presentation.

Fagperson(er)

Head of Department:

Tom Ryen

Course coordinator:

Tomasz Wiktorski

Laboratory Engineer:

Jayachander Surbiryala

Method of work

The work will consist of 6 hours of lecture, scheduled laboratory, supervised group work per week in the second half of the semester. Students are expected to spend additional 6-8 hours a week on self-study, group discussions, and development work (open laboratory).

Overlapping

Emne	Reduksjon (SP)
Data-intensive Systems (DAT500_1) , Data-intensive Systems and Algorithms (DAT535_1)	5

Åpent for

Admission to Single Courses at the Faculty of Science and Technology

Data Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme Computer Science - Master of Science Degree Programme, Part-Time

Exchange programme at Faculty of Science and Technology

Emneevaluering

There must be an early dialogue between the course supervisor, the student union representative and the students. The purpose is feedback from the students for changes and adjustments in the course for the current semester.In addition, a digital course evaluation must be carried out at least every three years. Its purpose is to gather the students experiences with the course.

Kontakt

Head of Department:

Tom Ryen

Course coordinator:

Tomasz Wiktorski

Laboratory Engineer:

Jayachander Surbiryala

Tilbys av

Faculty of Science and Technology

Department of Electrical Engineering and Computer Science

The course description is retrieved from FS (Felles studentsystem). Version 1

Data-intensive Systems and Algorithms (DAT535)

Fakta

Emnekode

Vekting (stp)

Semester undervisningsstart

Undervisningsspråk

Antall semestre

Vurderingssemester

Content

Learning outcome

Forkunnskapskrav

Anbefalte forkunnskaper

Exam

Vilkår for å gå opp til eksamen/vurdering

Fagperson(er)

Head of Department:

Course coordinator:

Laboratory Engineer:

Method of work

Overlapping

Åpent for

Emneevaluering

Kontakt

Head of Department:

Course coordinator:

Laboratory Engineer: