Data lakes are centralized repositories for storing large amounts of raw data, including system data and data for reporting and advanced analytics. They may contain structured, semi-structured and unstructured data as well as images, audio and video.
Data lakes differ from data warehouses in that warehouses employ files-and-folder hierarchies to organize the data stored while the data lake’s flat architecture uses metadata tags to help with searching and identifying relevant information.
Most organizations operate both a data lake and a data warehouse, as they serve different needs and this chart from AWS illustrates the ideal uses of the two storage strategies.
It’s critical to secure a data lake so it isn’t accessed without authorization or altered for malicous purposes.
View the Cybersecurity Dictionary for top terms searched by your peers.