Мы используем файлы cookie для быстрой и удобной работы сайта. Выберите, какие файлы cookie вы разрешаете нам использовать. Подробнее в Политике конфиденциальности.
 
We use cookies to ensure our website works quickly and easily. Choose which cookies you allow us to use. Learn more in our Privacy Policy.
 
Building a data warehouse and reporting system from scratch
2-4
30
the integration time of one source has been reduced
from 3 days to
data sources integrated
DEVELOPMENT OF CORPORATE DATA WAREHOUSING AND DATA LAKES
tourism
ч
Построение хранилища данных и отчетности
«с нуля»
Customer
The largest tourism organization in Russia
– more than 3 million clients/users
CHALLENGES/FEATURES
the total duration of the project is more than 2 years

The development team was strengthened by 1.5 times in 3 weeks, from 8 to 12 developers.
Develop a regulatory reporting system to support management decision-making in tourism for various data products
Task
solution
Technical solution
1. Conducted an analysis of the customer's business processes and needs for reporting and analytics
2. Developed a data warehouse architecture and defined the data storage and integration structure:
3. Developed integration solutions between the customer's disparate information systems:
used open-source technologies, providing the ability for flexible scaling and integration
deployed a data warehouse, which became the basis for storing heterogeneous data sources
mechanisms for retrieving data via HTTP and JDBC protocols
data quality check modules
modules for checking the input data schema and validating the integration contract
unified data extraction processes based on Airflow + Python
4. Developed a set of functions based on PostgreSQL for generating data marts based on indicators in various sections

5. We developed a data provisioning module based on Python FastAPI, which provides data to external systems on request: CRM (showcases on the communication map and recommendations) and CDP (segments)

6. We developed a data quality management subsystem that includes tools for visualizing quality metrics for routine processes, data errors, etc. based on Grafana, a quality metrics storage component based on PostgreSQL, and a metrics generation component based on a Python module.

7. Based on BI solutions, we created a set of dashboards that allow for quick and transparent management decisions based on data products.
Result
Business values
Extensive opportunities for scaling and developing data practices
The use of open-source technologies has enabled the creation of a flexible, scalable data platform.
the data warehouse architecture was designed from scratch
A proprietary platform for extracting and processing data from sources as a set of standard functions in Python
We created our own platform for automating the integration of various data sources, which reduced the integration time of one source from 3 days to 2-4 hours.
Subsystems for data storage and processing, integration, orchestration, provision, quality, and visualization of data have been deployed. Integration of a large number of sources, modules for data quality and provision form the basis for best practices and approaches to data management.
30 data sources and more than 100 entities are integrated
35 data marts have been developed
5 web services for providing data from the data warehouse to external systems have been created
By clicking the "Submit" button, you expressly consent to the processing of your personal data to the extent and for the purposes defined in the Personal Data Processing Policy.
Development of software
and Big Data solutions
Send a request and our specialists will contact you within 1 hour.
Choose a convenient method of communication
You can attach three files up to 3 MB each. Formats: doc, docx, pdf, ppt, pptx
Сообщение об успешной отправке!