The traditional data infrastructure is built around a single, monolithic source of all enterprise data, be it a data warehouse, or more recently, a data lake.
Organizations are beginning to realize some of the problems in this design:
- The limitations of centralized teams: Centralized data teams cannot possibly understand the data needs of all of the different departments that they serve.
- Lack of flexibility: One central platform cannot be flexible enough to accommodate the requirements of an organization’s different departments and projects.
- Slow data provisioning: Centralized platforms are inherently rigid, as they are only set up to perform standard operations across the entire organization. As a result, data provisioning is slow, and can never be real-time or on-demand.
Data mesh is a decentralized data architecture that attempts to solve these problems by replacing a single, centralized data source with multiple data domains, each managed by different departments within the organization.
In order for data mesh to work, as described above, it needs a data delivery system that can address its distributed nature. Traditional replication-based data integration approaches, such as extract, transform, and load (ETL) processes, are not capable of performing this function, as they are designed to move data from multiple data sources into a single repository.
Logical data management, in contrast, is a perfect fit for data mesh. Unlike ETL processes, it provides real-time access to data without having to replicate it.
Logical data management architecture is extremely powerful in enabling data mesh:
- The only data that logical data management platforms centralize is the critical metadata for accessing the different data sources.
- This architecture enables organizations to implement governance and security protocols across all of the different data domains from a single point of control.
- This architecture also enables organizations to implement highly tailored semantic models above the individual data sources, that effectively serve as data domains without changing the underlying data.
- These semantic models can be easily changed, developed, or re-designed, again without changing the underlying data.
- Logical data management platforms enable full-featured data catalogs that not only list what data is available but can also provide ready, real-time access to it, in a self-service manner.