Discover how to create a data lake with easier access to data, faster data preparation, enhanced agility, and more trustworthy data.
No one contests the exponential growth of data; the stats are mind-boggling. More data has been created in the past few years than in the entire history of the human race. Managing, housing and, more importantly, leveraging this data is essential to creating data-driven success.
For a lot of organisations, Data Lakes are an easier and more flexible alternative to building a data warehouse. They are typically being deployed using one of the many flavours of Hadoop or big data platforms and often get deployed either as a place for ad hoc data discovery or as a data store for the terabytes of data now generated by an organisation. The advantages of data lakes are many, including the ability to store many types of data and to load that data in real-time.
The ability to avoid the cost and effort of defining data structures upfront seems like a significant advantage. You can achieve huge time and cost savings by pushing this task down to the point of consumption - usually the Data Scientists doing the analysis.
Unfortunately, data lakes are not a magical solution. Poorly implemented and unmanaged lakes can quickly become data swamps. Organisations must pay attention to data quality. To create a successful data strategy, you must start with data stewardship, ensuring that you address the organisational structures, processes and cultures that support basic information management principles and allow you to keep your lake clean. By just promoting a minimum practice of information classification upfront, to capture basics like the source, owner and intended purpose, you can ensure data loaded into your lake can be reused by others and support maintenance functions to help keep the waters clean.
In addition to the data veracity problem, big data also poses potential security risks. Too much data with little knowledge of what it contains can be catastrophic. Data is easily duplicated and personal and sensitive data can be potentially exposed to business users that should not see it, let alone from a regulatory or data leakage perspective. A data governance solution ensures that organisations not only protect their information assets but also their customers.