How Data Meshes Can Help Businesses Better Manage Their Data
How can businesses manage the enormous volumes of data that power their operations today? The conventional answer has been to use technologies such as data warehouses and data lakes, which provide ways to store massive amounts of data in a centralised location that any part of the business can access.
But centralised data storage platforms don't always cut it. They don't enable efficient domain-specific data operations, can hinder self-serviceability, and they can complicate the enforcement of data governance requirements.
That's why businesses are increasingly turning to data meshes. Data meshes represent a radical new take on data management—one that aims to make data more flexible while simplifying data management processes, even as data consumption and governance requirements continue to grow in complexity.
Introduced in 2018 by Zhamak Dehghani, director of emerging technologies in North America at Thoughtworks, data mesh is founded on four key principles of Domain Ownership, Data as a Product, Self-Serve Data Platform, and Federated Computational Governance. A paradigm shift that requires collective effort of many complementary roles and disciplines, data mesh can deliver huge benefits to businesses, improving data quality and access that can speed up decision-making, agility, innovation and scalability.
Keep reading for a breakdown of how a data mesh architecture works, why data meshes benefit businesses and how to determine if a data mesh is the right approach to meet your company's data management needs.
What Is A Data Mesh?
A data mesh is a data architecture in which data is organised in a domain-oriented manner. In a data mesh, each business unit or domain manages its data in its own way.
Thus, under a data mesh architecture, business departments such as sales, marketing and customer service would each get their own sets of data storage technologies, analytics tools and processing technologies. They'd also be able to apply different data governance and security rules to their data.
Primarily though, a data mesh is an architectural approach to solving the problem of data silos in large companies that also need to share and govern their data. Either you have big, monolithic central data stores that require huge data migration projects and lots of central maintenance to function, or you have pockets of data in a wide variety of siloes—as a result of acquisitions and data that is produced as a by-product of your applications. Most likely of course, you may have a combination of both these extremes. The data mesh design is one that looks to simplify what needs to be done to the data, while still enabling democratisation and governance. A data mesh will not dictate a data migration to a central warehouse or lake, but it will still allow you some central control of how data is accessed in its home locations.
Data Meshes vs Microservices
The data mesh concept draws explicitly on microservices software architectures. In software development, modern businesses often use microservices to break up complex applications into smaller functional pieces, which makes it easier to manage each microservice as appropriate.
A data mesh does something similar for data. It breaks down large, monolithic data repositories into smaller, discrete pipelines that can be optimised for multiple use cases. Put another way, data meshes decentralise monolithic data architectures into domains, just as microservices do.
At a high level, data meshes and microservices are similar in the respect that they both break complex, inflexible resources into smaller pieces that are easier to adapt to varying requirements.
What Makes Data Meshes Unique?
This domain-specific approach to data management is what makes data meshes different from traditional data management strategies.
Historically, many businesses that needed to store, process and analyse large volumes of data relied on a central data platform, such as a data lake or a data warehouse, both of which offer means of storing large amounts of data. They also usually applied centralised governance policies across all of their data, and they provided a centralised set of tools that all business units were expected to use when they wanted to process or analyse data.
As a result, data architectures have traditionally been monolithic and generic. Rather than catering to the unique needs of different users within the business, conventional data architectures expected everyone to store data in the same place, and to work with data using the same tools and approaches.
The data mesh architecture upends this concept by making it possible to decentralise data storage and management. In a data mesh, every unit in the business can manage and govern data in whichever way best suits it, rather than being forced to adapt to a one-size-fits-all data management strategy.
The Business Benefits of Data Mesh
From a business perspective, data meshes offer a variety of benefits that increase the value of data while also reducing the cost and complexity of data management.
Domain-oriented data management
Arguably the single most important benefit of a data mesh is that it provides each unit within the business control over how it manages its own data. Rather than imposing a central set of data tools and processes on all users, a data mesh makes it possible for each group to choose whichever data management tools and techniques make most sense for its needs.
To understand how that delivers value in practice, consider how two distinct business domains, say, a customer service department and an IT department, might work with data under a data mesh model. For the customer service department, being able to analyse data that is linked to individual customers is important, but that's likely less critical for the IT department, which will be more interested in analysing data related to non-human entities, such as servers and applications.
Because of these differences, each group will want to use different data processing tools. The customer service department is likely to care more about parsing its data using CRM analytics software, for example, whereas the IT department will want to use solutions like log analytics tools. In addition, each domain will have different data governance and compliance requirements because the customer service department is dealing with data linked to individuals, which is generally more sensitive than machine data.
Under a data mesh architecture, each department would have the flexibility to define its own approach to data management. It could set and enforce its own data governance requirements, analyse data using its own tools and even store data in a location of its choosing, if it wishes. None of this would be possible under a traditional, monolithic data architecture.
Self-service data management
Another benefit of data meshes is that they enable a self-service approach to data management.
This is a big deal because conventional data architectures require businesses to operate a centralised data management team. When a group within the business wants to do something like process data or implement a new data governance rule, it has to ask the central data management team to handle the request. That process often leads to delays because the data management team is overburdened with requests. Another problem is that, in some cases, the central data managers lack the domain-specific knowledge to implement requests from different business units effectively.
A data mesh addresses these challenges by allowing business units to take direct ownership over their data. Solving it entirely is not quite possible yet. Although data integration and processing needs can be handled by business domains themselves, there may still be a central data team responsible for managing the business's overall data architecture.
Now, not all data owners are equal. Teams might own a single, clean domain, but they might look after parts of data domains, where they are aligned to technologies rather than business areas, for example. The same goes for their capability in managing the access to their data and connecting to the data mesh—some teams will be able to absorb this and some will simply be too small or have no capacity to join the mesh by themselves. A data mesh approach will make the domain teams’ lives harder, simply because thinking about data as a set of products will take more effort than just dumping data into a database and giving users SQL access to it. Whichever end of this spectrum they are on, they will need support during the initial rollout, and possibly on an open-ended basis from then on in some cases. Your central data team will need to reflect the capabilities of your domain teams and complement them, providing this support if needed. It will administer things like query access, access control policies, and the common data tools you decide on, which could include ETL pipelines, data science tools, data cataloguing or a data marketplace.
Lower-cost data management
With traditional, centralised data architectures, it’s often difficult to optimise the cost of data management because it’s impossible to apply a policy, tool or integration to only some of the data. If a certain governance rule needs to be in place to satisfy the needs of one business unit, for example, that rule would have to be applied across the entire data repository, which is cost-inefficient in situations where only some of the data actually requires the governance protections.
With a data mesh, however, data tools and policies can be applied only to the specific data pipelines where they are valuable. As a result, businesses avoid the unnecessary cost of blunt, generic approaches to data management.
Traditional data management architectures are scalable in the sense that you can always add more data to your data lake or warehouse. However, scaling data processing pipelines is a separate process, and is often more difficult because the entire pipeline would need to be overhauled in order to accommodate changes that might be required by only one business domain.
Data meshes avoid this limitation because they make it possible to scale in a more granular, domain-specific way. When each business domain has its own pipeline, it becomes much easier to modify that pipeline, or scale up its capacity, without having to make changes that apply across the entire data architecture.
A data mesh is another step on the journey towards treating data with an agile mindset, and a move away from the traditional, multi-year data projects to try to build a monolithic platform that will satisfy all possible use cases.
Data Mesh Technologies
The fact that data meshes decentralise data management doesn't necessarily mean that data meshes are incompatible with traditional data storage technologies. In many cases, businesses can continue to use data warehouses or data lakes as central repositories for storing at least some of their data as part of a data mesh framework.
However, within a data mesh, each business domain sets up its own data processing pipeline and connects it to the data warehouse or data lake. That means that every unit within the business can extract, transform, analyse or otherwise process data in a manner best suited to its needs, using the tools of its choosing—even if some or all of the data it processes comes from a central repository.
As we have seen, a data mesh is a data architecture for decentralised control, and not a specific technology. There is no single platform that can offer an end-to-end data mesh solution. To enable a data mesh though, there are some things to consider. There is a need for a central point for discovery and querying of data, even though the data itself in a data mesh remains in its home domain and home database technologies. With centralised access comes the need for centralised access control, so that the right data can only be seen by the appropriate consumers. At Contino we have leant towards a combination of Starburst and Immuta to provide the backbone of data mesh implementations. Immuta for fine grained access control and Starburst as a data consumption layer that connects to data sources and allows access via SQL. Plenty of alternatives are available though, with Snowflake a very credible one. The choice of technology should support your aims and current technology profile, much like any other technical implementation.
Data Mesh Implementation
It's worth noting that a data mesh can involve storing some data in databases or storage services that are dedicated to individual business domains, or data storage that is more centralised. You don't have to centralise all data in a common repository as part of a data mesh. Each business unit gets the flexibility to pick and choose where its data is stored, which is part of the core point of using a data mesh.
Still, because data meshes don't require fundamentally new types of data storage or processing technologies, implementing a data mesh doesn't mean completely overhauling your business's data stack or tooling. Nor is there a requirement to perform large-scale data migrations (although some reorganisation of data may be necessary).
Instead, moving to a data mesh really boils down to giving each domain within your business greater flexibility about where and how it manages the data it requires. Data mesh implementation is about rethinking the way different parts of your business use data at least as much as it's about updating data technology itself.
Giving each domain flexibility in their data management is primarily an operational challenge and can need significant operating model transformation to make it successful. We believe an Agile approach is beneficial for any type of project and a data mesh transformation is no exception. Don’t bite off too much in the early phases. Instead, identify a small number of initial domains and consumers, and then take them on the technology and process change journey. These are vital early steps to building the foundation for a successful roll out further down the line. Focus on the business problems to solve and the desired outcomes, and use a data mesh implementation as a means to address these and not as an end in itself. And don’t forget that the data mesh paradigm is still a young concept. It’s not yet a well-trodden path so you’ll have to work out some of the answers yourself; without careful management it has the potential to increase complexity and cost much like other previous popular data architectures.
Challenges of Data Meshes
While data meshes are a powerful solution, they have some potential challenges:
- Lack of standardisation: Even though each business domain manages its data separately, data meshes must be driven by consistent, standardised data management techniques to work well. Ensuring consistency across the entire data architecture can be challenging because in some cases, different business domains may follow different policies, or there may be no organisation-wide data standards in place.
- Migration: Even though moving to a data mesh doesn't require a total overhaul of data technologies, it does require some migration and adaptation to implement different pipelines for each domain. That can be time-consuming and costly.
- Unsupported domains: There is a risk in some cases that data meshes won't cater to every business domain because those domains may not have the resources or expertise to manage their own data. Domains that are not covered are left without a means of managing data that is important to them—a problem that doesn't exist with centralised data architectures, where every business domain can access the data repository and pipeline.
- Redundant tooling and processes: In situations where different business domains have substantially similar data management needs, a data mesh may result in the duplication of tools or processes because each domain ends up building similar data pipelines. From a purely cost perspective, this may be viewed as inefficient.
- Buy-in for the people and process changes: Seen as another technology project to deliver data to the organisation, it is easy to overlook the operating model changes needed for successful adoption of data mesh. Getting buy-in from your stakeholders that this is necessary can be difficult but it’s vital to a successful transformation. Otherwise you are left with some technology plus some domain teams who don’t know how or why they should use it.
Note that these challenges don't apply in every case and there are . Whether your data mesh may be subject to these drawbacks depends on your business's specific data management needs and strategies.
So, Should You Embrace Data Mesh?
In many cases, the answer hinges, in part, on how large your business is. For smaller businesses, a data mesh may be overkill. Data meshes may also be unnecessary for companies whose data management needs are relatively consistent across domains, and which are likely to end up with redundant pipelines if they transition to a data mesh.
But for the majority of large businesses with complex data management requirements today, a data mesh is the way to go. It enables a more efficient, granular, purposeful approach to working with data—and one that can be tailored to meet the varying needs of different business domains while simultaneously reducing costs and decreasing the time and effort necessary to derive value from data.