Data Engineer Course
with Apache Airflow

By completing the author's course from a Big Data expert, you will learn how to design data architecture and be able to secure a job as a "Data Engineer," one of the most in-demand IT professions

Gain practical skills from the 2nd lesson
Based on the popular data orchestration solution Apache Airflow, an original course for training data engineers has been developed.

The main advantage of the course is the rapid practical mastery of Airflow.

Hands-on sessions and a final independent project will allow you to immerse yourself in the real process of acquiring practical skills in Airflow from the second lesson onwards. Experienced instructors with a wealth of experience in implementing projects based on Airflow will impart real project knowledge and invaluable expertise - best practices for using Airflow in data projects
Обучение Apache Airflow
For whom?
This course is designed for:

  • Data engineers
  • Python developers
  • System administrators
who are transitioning to using Airflow from other technologies or learning Airflow from scratch.

The course is not intended for experienced Airflow developers, as it focuses on providing fundamental foundational knowledge and best practices
Get a personalized CP + 15% discount on services
Required Competencies
Course participants should have basic knowledge of:

  • linux
  • python development
  • fundamentals of Docker and Docker Compose

Before taking the course, it is recommended to read the official documentation to familiarize oneself with the material.
At the end of the course
As a result of the course, participants will:

  • Gain theoretical knowledge of using Apache Airflow.
  • Learn about real-world use cases of Apache Airflow in projects of various complexity levels.
  • Acquire practical skills in deploying Airflow in different configurations.
  • Develop skills in writing DAGs, connecting and interacting with databases, using various types of operators, applying different types of variables, and utilizing additional libraries and executors.

By the end of the course, participants will be able to independently deploy Airflow in an optimal configuration, optimize performance based on business requirements, and develop basic data pipelines.
  • The duration of the course
    This course is designed for a total of 18 classroom hours and 36 hours of self-study. It includes 9 lectures, each lasting for 2 academic hours. This structure allows participants to better absorb the material, with practical assignments provided after each lecture to reinforce the concepts learned.
  • Format of conducting
    Remote
  • Class Schedule
    All lectures are held in the evenings three times a week, lasting for 2 academic hours each, starting at 19:00. This schedule allows participants to avoid distractions from their primary work tasks.
  • The start date of the course
    The course starts as soon as a group is formed. To enroll in the nearest group, you need to contact them by email at ductechnologies@yandex.ru or submit a request using the form on the bottom of the page.

The course structure

1
a) Introductory Lecture 1: Introduction to Apache Airflow. Main components, purpose, versions, and architecture. Infrastructure requirements (2 hours, in-class).

b) Study of the official documentation for Apache Airflow, architecture, and key components (4 hours, self-study).
2
a) Lecture 2: Features and installation options of Apache Airflow. Prerequisites. Installing Airflow using docker-compose. Interface overview. Answering questions from Lecture 1. Homework.

b) Self-study. Self-study. Installing Airflow on local machines using docker-compose, checking correctness, and running demo scripts like hello world, etc. (4 hours, self-study).
3
a) Lecture 3: Key configuration parameters of Airflow. Concept of DAG (Directed Acyclic Graph). Features of DAGs and basic syntax. Answering questions from Lecture 2.

b) Self-study. Self-study. Writing a simple DAG in Airflow. Bash and Python operators.
4
a) Lecture 4: Logging and monitoring of DAGs. Features of working with different databases. Introduction to PostgresHook, PostgresOperator, and PythonOperator (2 hours). Answering questions from Lecture 3.

b) Practical session. Practical session. Working with databases from Airflow.
5
a) Lecture 5: Different types of Executors and architectural aspects of task execution. Using XCOM, Jinja templates, and variables (2 hours). Answering questions from Lecture 4.

b) Practical session. Practical session. Creating a DAG using XCOM, Jinja, and variables.
6
a) Lecture 6: Best practices for using Airflow. Serialization of DAGs. Docker Operator, trigger rules, and trigger Operator. Answering questions from Lecture 5.

b) Practical session. Practical session. Creating a DAG with a Docker Operator, trigger rules, and trigger Operator.
7
a) Lecture 7: Security and Role-Based Model in Airflow. Airflow API. Installing additional packages. Managing notifications. Mini-project assignment.

b) Independent work. Independent work on implementing a mini-project.
8
Presentation of the project results and Q&A session. Discussion with course participants (2 hours, classroom).

Total workload for the course: 16 classroom hours and 32 self-study hours.

Do you have any remaining questions about learning Apache Airflow?
Leave a request for training right now, and our specialists will promptly consult you and answer all your questions!