The flexible modeling
technique for data warehouses
Data Vault is a modeling technique for data warehouses that is particularly suitable for agile data warehouses. It offers maximum flexibility for extensions and enables strong parallelization of data loading processes as well as complete temporal data storage. In this way, Data Vault also meets the requirements for revision security of an information system.
Data Vault modeling was developed as early as the 1990s by Dan Linstedt with the aim of developing a scalable, flexible and internally consistent warehouse. A static data warehouse becomes more and more complex over time, which leads to high costs in case of extensions or changes.
Compared to classical or dimensional modeling according to Kimball, Data Vault 2.0 focuses on flexibility and a simple, step-by-step integration of data through a consistent decomposition into clearly structured components with unambiguous responsibilities. This leads to manageable loading processes that can be automated.
The data vault model
In Data Vault modeling, all information belonging to an object (e.g. customer data, products, processes) is divided into three categories and separated from each other. This information is located in different tables, but is linked by a common key. This makes it easy to create new categories and thus expand the entire data warehouse.
The first category "Hub" contains information that uniquely describes an object and thus gives it an identity. In our example - a bicycle rental - this would be the station with a unique name and the terminal with an ID.
The second category "Link" contains all kinds of relationships between individual business concepts. In our graphic we describe which trips are made from/to which station with which bike. Basically, in addition to operations, hierarchical relationships or identity relationships can also be described in the "Link" category (e.g. bike A belongs to station X).
All attributes that describe a business concept or a relationship - in our illustration, for example, the length of a ride - belong to the third category "Satellite". A hub, or even a link, can have multiple satellites, split by source of data or frequency of changes, for example. In addition, the unitemporal historization takes place in the satellites, thanks to which the data warehouse fulfills the requirements for revision security.
Why should companies rely on Data Vault?
The goal of Data Vault is to help the organization rapidly deliver integrated data for analysis and reporting.
The benefits of Data Vault 2.0:
- Enables rapid data understanding across the enterprise with traceable
- and transparent data
- Significantly reduces development time for business requirements
- Short waiting times for important analysis results, even with large data volumes
- Standardized architecture and automated data provisioning
- Seamless integration of a wide variety of data sources with traceability to the source system
- Unaltered and complete historization to meet compliance and audit requirements
- Display and analysis of key date-related data
- Agile, iterative development cycles with gradual expansion of the data warehouse
- Also enables the development of an upstream data warehouse in an existing silo architecture
From a technical perspective, it should be emphasized that Data Vault supports classic batch processing as well as near-real-time loads. Companies can also integrate unstructured data. Compared to classic DWH architectures, the business logic is used in the Business Data Vault and in the information mart layer, which means that it is implemented as close as possible to the end user. Accordingly, a "late" mapping takes place and the DWH is filled exactly with the data as it exists in the source system.
Banian & Data Vault 2.0
When developing a BI environment, you need a holistic approach. At Banian, we look at the entire development process and architecture in addition to the methodology.
Why Banian:
- Successful collaboration and application of Datavault Builder, Wherescape, MID Innovator and other modeling tools.
- Customized templates for using Data Vault in Wherescape, MID, Matillion, Exasol and Snwoflake
- Own Data Vault Plug-in for MID Innovator, which uses graphical modeling to automate DWH development
- Continuous education with certifications, international boards and conferences
- Co-organizer and board member of the leading Data Vault user group in the German-speaking world DDVUG (deutschsprachige Data Vault User Gruppe)
- Already several successfully implemented Data Vault projects - many of them with own Data Vault plug-in. Find out more about them here.
Our approach:
In a first step, we model your business model with the relevant objects and the relationships together with you. The second step is to identify and integrate the source systems and analyze the information. Based on this, the defined business objects are connected with the data and the Data Vault model is modeled. During the integration of the data, the modeling and in the construction of the DWH, our prefabricated templates support us and thus accelerate the overall process. At the same time, the entire process, the business objects and also the business rules are sustainably documented.