Avoiding the endless staircase of a Data Lake
Avoid the endless staircase of a Data Lake – create value with Linked Data instead!
By Alex Dowdalls and Arie Hakemulder
Many traditional organisations are currently toiling on Master Data Management (MDM) or Data Lake projects with seemingly no end in sight. Managers believe that they firstly need a data lake before they can create value with data. This results in a project aiming to copy data into a central repository and then maintain its accuracy and completeness. These projects often struggle to achieve data completeness and quality, let alone ensuring lineage and privacy. The specialists and management teams dealing with this topic are like the figures endlessly walking upstairs in the highlighted Escher painting. That is soul destroying.
Let’s acknowledge that managing data is a complex task made even more complex if you inherit a legacy starting point as is the case with most traditional businesses. Data was never really a topic on the management agenda, data management has not really been understood and therefore left to the IT department or chief privacy officer. The technology to manage data has traditionally been the database with cloud and other variants encouraging large volumes of data to be copied and stored at low cost without creating value with the data.
The next complication is time. When we discuss becoming “data driven” with executives today, we often hear that this is the goal but that the company must firstly spend the next year or so trying to get its data organised, for example in a data lake. This is not how Big Tech, Unicorns and startups approach this issue.
How can you create value with data without building a data lake?
The answer is to use “Linked Data” combined with FAIR Principles as an alternative approach to a data lake. You are already using this when you use Microsoft, Facebook, Google and many related apps. Many other companies and governments have already scaled this concept to an enterprise data strategy.
How does this work?
The idea, put simply, is to keep data at the source and manage its use within applications in such a way that both man and machine can use the data. The key change is to use meta data in combination with globally unique identifiers with your data. The process of managing data becomes automated – you cannot do this with a manual data management approach! The figures trudging endlessly up the staircase stop trudging and get to do meaningful work creating value with the data instead.
To achieve this, an organisation must reconsider the traditional approach to a data lake and look at the concepts of FAIR Data Principles and Linked Data Technology. The FAIR data principles describe how data can be made:
- F: Findable – by both man and machine, quickly finding what is available
- A: Accessible – applying rules to authorise and manage access to data
- I: Interoperable – readable from multiple formats in a secure manner
- R: Re-usable – set up to enable re-use by multiple parties in an ecosystem
The Linked Data Technology manages the data using “meta data”, which is basically data about data. The meta data is set up in such a way that a machine can access the data through a predefined set of rules. In this way, data can be accessed and used by both man and machine to create value. This enables accurate, reliable, and up-to-date source data for Business Intelligence, Artificial Intelligence and within data driven business models.
This approach frees up 80% of data scientists’ time as they can search, find, filter and prepare datasets more quickly. [source: Crowdflower 2016]. Finally, this technology offers privacy and provenance by design because it uses the source of data instead of a copy.
What is the next step?
Will you be spending the coming year just getting your data organised? Or will you look at FAIR and Linked Data as an alternative? We have developed consultancy and training services to help you learn about this new approach, then develop and execute effective data management strategies. This will enable you to create value with data quickly.
You can start here.