DataBytes   |   3423 Piedmont Rd NE Atlanta, GA 30305   |   (404) 400-7477   |   Email

Data Lakes

Data Lakes – An Agile Approach to Information Management

What is a data lake?

A data lake is a storage repository holding large amounts of disparate data that can be stored in native or raw format. Popularity of data lakes has grown as more companies enter the big data arena and more tools become available to query autonomous data. By decoupling storage from compute, companies can store any data set, large or small, for current and future use with minimal cost. Additionally, metadata descriptors are indexed and stored alongside data enabling search engine like capabilities to locate data sets that may align to a business or project need.

How are data lakes different than a data warehouse?

Data residing in traditional data warehouses normally goes through multiple layers of hygiene, standardization algorithms, and validation against a master source. Data is typically structured using star or snowflake schemas and is bound by the data type assigned to each column. Changes can require considerable effort and testing to ensure downstream processes, such as reports, will not be adversely affected.

The following highlights some of the key differentiators between data warehouses and data lakes:

Data Warehouse

  • Structured data only
  • Schema applied on write
  • Storage cost dictated by I/O requirements
  • Charged for storage provisioned
  • Less agile, changes can be cumbersome
  • Used for enterprise reporting, vital to company operations

Data Lake

  • Structured or unstructured data
  • Schema applied on read
  • Very low cost storage
  • Pay only for what is stored
  • Very agile, highly adaptable based on requirements
  • Designed for data exploration and experimentation

Data lakes should not be viewed as a replacement for data warehouses. Data is not held to the same stringent vetting and inspecting processes required of warehouse data. They are designed with innovation in mind, providing data scientists and analysts with a platform for data experiments.

Contact us to discuss how data lakes can transform the way your company manages its data assets.