Driven by data explosion, access to affordable computing, and business imperatives the accumulation of unstructured data is on the rise. Gartner repots suggests 80% of transactional, and organizational data to be in the unstructured format. Data Lakes are the latest in the series of innovations which enable organizations to master new big data challenges. Establishing a business Data Lake helps build a singular data culture through the organization, making the organization data centric in terms of generating relevant insights, and redesigning processes to generate relevant data at each step of the business.
A Data Lake is a central repository to “store data of any format, schema, and type”. It offers scalability and is relatively inexpensive compared to a traditional data warehouse when scalability is taken into account.
Data Lake also offers numerous capabilities for organizations in terms of the variety of data sources handled, cost of raw data storage, ability to handle large scale data, ability to perform a wide range of analytical transformations, and the ability to handle and store unstructured data. Data Lake enhances unstructured data analytics as it supports key applications such as indexing the Web or enabling ad targeting. Data Lakes have enabled new business values through the value chain it has created. Businesses have been able to improve the speed and quality of web search by transformation of clickstream data, and this has also improved targeted marketing efforts by improving the of web advertising. In cross channel analytics, Data Lakes play role in providing a single view of truth of the customer, as it allows retailers to understand customer interactions, and behavior on a much detailed level using unstructured data.
The need for a Data Lake is often questioned today when most organizations have spent considerable amount of time and resources in building a data warehouse. It is time organizations embrace Data Lakes, and as a compliment to their enterprise data warehouses. Data Lakes enable consolidation of data from multiple sources, most importantly unstructured data. Data warehouse and Data Lake are components of a logical enterprise data warehouse.
Data Lakes are central to unstructured data analytics which enables:
- Customer Experience Management
Time and cost for identification of customer and employee issues will be reduced and reduce attrition
- Brand Monitoring
Help companies to keep a tab of the health of their companies brand image by analyzing trends over a period of time
- Understand sentiments and current buzz
Identify the positive/negative impact of content publishing and online marketing strategy
Once considered heavy on the existing IT infrastructure, Hadoop 2 has enabled creation of flexible Data Lake, which presented multiple ways to access data including interactive, online, and streaming data. Hadoop 2 has helped remove business silos and allowed people across various business functions to “refine, explore, and enrich data”.
Data Lakes have been a revelation to organizations looking for solutions away from a repetitive ETL cycle. It has allowed organizations handle data across different levels of quality, and governance standards, and allowed business units to take initiative of the data generated, and actionable insights generated. It is time organizations realize the true potential of unstructured data, and implementation of Data Lakes would be an ideal in moving towards a congruent, cost effective and scalable data management system.