Data Lake

  1. Introduction
  2. Comprehensive Understanding


Data Lake is a hot topic in big data nowadays. There are many practices in many big technologies, such as AWS Lake Formation and DataBricks Delta Lake. The Delta Lake aims at table format that interacts with the SPARK and storage platform (HDFS, S3, Azure, e.g). The Lake Formation is denoted as the middle layer between S3 and EMR. However, many people are still confused about its definition, each company has its definition by themselves.

Even many Hadoop companies consider that their Hadoop product is the Data Lake because they have solved the storage and computation problem of structured data, semi-structured data, and unstructured data. For those cloud companies (AWS,e.g), then think that Data Lake is a method or tool to manage data that has been stored in the cloud platform.

So the Data Lake is still absent in its standardization.

Comprehensive Understanding

As we talk about data lake, it is easy for us to compare it with traditional data warehouse. Maybe someone have doubts why we need data lake after a wide application of data warehouse.

In general, there are some differences between data lake and data warehouse.

item data lake data warehouse
data format structured/unstructured/semi-structured structured
computing strong,support all formats of data\n transforming and computingn weak, only structured data computing
data model better, more diverse normal, simple

In fact, data lake is a revolution of data warehouse. Traditional data warehouse emphasizes more on data formatting and paradigm processing, however data lake emphasizes on data clusive ability.
Hence a data lake is quicker and better than data warehouse in data landing and data adapting business changes. Maybe if only from the view of data lake concept, Hadoop meets all requirements of data lake. However, as a general platform, storage is just one aspect, it still requires many other sides to improve the user experience, such as data governance and data discovery. So the data lake is more advantageous than data warehouse.
If you want to reprint, please mark origin author. Please let me know if you have any doubts about the article.Welcome to comment here or email to


Article Title:Data Lake

Article Author:zendwind

Publish Time:2019-07-25, 09:55:51

Last Updated:2019-07-29, 09:32:16

Original Link:

Copyright: "Signature-Non commercial-Reservation 4.0" Any reprints required the reservation of original author, thank you!


Thank you for your reward