Over the last couple of years, big data fabric technology has emerged as a strategic way for companies to get the most value from their data investments. As data lakes proliferate, they also become more difficult to manage. A big data fabric might be the answer for many enterprises struggling to manage their vast stores of big data.
So what, exactly, is a data fabric? And how does it relate to the traditional data lake? Here’s a closer look at the function of a data fabric and what its advantages are, so you can decide if the technology could help your business.
As big data evolves, organizations are tending toward having multiple data lakes instead of a single one. These additional data lakes are built for a number of reasons: they may serve backup or disaster-recovery purposes for an existing production data lake, or perhaps they may replicate the contents of one data lake to another geographic location.
Regardless of why data lake proliferation happens, it presents a challenge to any organization: How do you ensure consistent security governance and data management across all those data lakes?
You likely spent a lot of time building policies for governing and securing your first data lake. When you built another lake, you most likely wanted to ensure you had a way to consistently apply those original policies to that one, too. But the more your big data environment grows, the more difficult it becomes to govern, secure, and manage all those data lakes. It may even become necessary to build out brand-new policies that take the new size of your environment into account.
Another complicating factor is the emergence of the cloud. Many use cases relating to data science, artificial intelligence, machine learning, deep learning, and the like are well-suited to operating in the cloud, and many companies are moving their data off on-premise data centers to save on operating costs and improve availability. While the benefits of the cloud cannot be disputed, it further expands the big data environment and may introduce new management complications.
All of these scenarios require that you have a management layer and abstraction layer that fit across all of your data sources or lakes, whether they are in the cloud or on premises. The abstraction layer’s role is to ensure consistent security and data management across data lakes. A big data fabric can serve as that abstraction layer.
A data fabric weaves together and surrounds all of your data sources. It’s aware of all that exists now and automatically registers new data sources as they are added.
There are several characteristics that a good, enterprise-class data fabric should have:
Ultimately, a data fabric helps you evolve your organization into a multiple data lake environment in an organized and secure way. Data fabrics help your organization achieve consistency, security, and high availability while providing a seamless management layer that is aware of all your data all the time.
For more on data fabrics, read The Forrester Wave: Big Data Fabric report.