Optiv Cybersecurity Dictionary

What is a Data Lake?

Data lakes are centralized repositories for storing large amounts of raw data, including system data and data for reporting and advanced analytics. They may contain structured, semi-structured and unstructured data as well as images, audio and video.


Data lakes differ from data warehouses in that warehouses employ files-and-folder hierarchies to organize the data stored while the data lake’s flat architecture uses metadata tags to help with searching and identifying relevant information.


Most organizations operate both a data lake and a data warehouse, as they serve different needs and this chart from AWS illustrates the ideal uses of the two storage strategies.


It’s critical to secure a data lake so it isn’t accessed without authorization or altered for malicous purposes.

