Why Big Data?
Big data solutions are primarily meant to address the three core data-related challenges faced at enterprise level. These are also called the “3 V’s Problem”.
Volume: A Big data solution stores and queries 100’s of TB of data. The total volume of data grows by 10 times every five years. Storage must be able to manage this volume of data.
Variety: The new incoming data is not matched with existing data schema. New data may be unstructured or semi structured.
Velocity: Data is being collected at an increased rate from many new types of devices and event locations. The design and implementation of storage must be able to manage this efficiently.
What the traditional databases used to do?
Traditional databases use a relational model where all data was stored using predetermined schemas. Unlike SQL and relational databases, data in a traditional database was stored in flat files, often in simple format with fixed width columns etc. The code reads one or more files sequentially or jumps to specific locations. Thus, traditional databases only support analytics designed against stable environments.
But industries such as retail and wholesale hardly get the comfort of a stable environment.
Every now and then, a market disruptor turns upside down all assumptions of slow-moving competitors. This is true for both hard-line retailers (electronics, furniture, appliances and sporting goods) and the soft-line retailers (apparel and clothing).
How relational database differs from traditional database systems?
Relational databases put an end to all this and gives us more power to get the information simply by writing queries in SQL. These systems hide all the complexity and logic for assembling the query that extracts the information. The volume of data is increasing by every passing day and we are moving beyond the capabilities of even the enterprise-level relational database systems. The next logical solution becomes one that is based on harnessing the Big Data.
Why it matters to adopt a Big Data solution?
Because it fundamentally changes the way that data is handled in your database. Big data batch processing solution is a process that breaks up the source files into multiple blocks. Input data is converted into blocks.
Image Source: #ibmido
It then replicates the block on a distributed cluster of commodity nodes. Data processing runs parallel on each node. In the last instance, parallel processes are then combined into an aggregated result set.
Image Source: #ibmido
All software applications and services have to depend on a database technology. It is here that the Big Data solution, such as one using HDFS (Hadoop Distributed File System), provides the leverage to an organization’s IT functions. By residing on multiple OS (Operating System) blocks, HDFS solves the issue of handling the ‘3Vs’ problem; handling the variety, volume and velocity of incoming data and converting it into visible metrics.
What is Apache Hadoop?
At the core of many Big Data solutions is an open source technology named Apache Hadoop.
Latest version of Hadoop commonly contains the following main assets:
- The Hadoop kernel, or core package, containing the Hadoop distributed file system (HDFS), the map/reduce framework, and common routines and utilities.
- A runtime resource manager that allocates tasks, and executes queries (such as map/reduce jobs) and other applications. This is usually implemented through the YARN(Yet another Resource Negotiator) framework, although other resource managers such as Mesos are also available.
- Other resources, tools, and utilities that run under the control of the resource manager to support tasks such as managing data and running queries or other jobs on the data.
With each passing day, the volume, velocity and variety of data is increasing.
This provides both an opportunity as well as a challenge. The opportunity lies in harnessing the insights that enterprise-level managers can obtain using this data. The challenge lies in how to tap this opportunity?
Well, this depends on various factors such industry sector and size of the firm, future objectives and organizational preparedness. We’ll delve deep into these challenges in future posts.