Today markets are widely open for Big Data and therefore, almost everyone today are interested in knowing – What is Big Data?
According to the IDC forecast, the Big Data market is predicted to be worth $46.34 billion by 2018 and is expected to have a sturdy growth across Big Data related infrastructure, software and services over the next five years. As per IDG Enterprise Big Data Research, in the next 1 to 1.5 years, organizations plan to invest in skill sets necessary for Big Data deployments, including Data Scientists, Data Architects, Data Analysts, Data Visualizers, Research Analysts, and Business Analysts.
Therefore, right from the technical professionals up to the chiefs of big organizations, all are wondering what this buzzword is about. In short, professionals from all sectors and industry type are curious to understand the concept of Big Data.
What is Big Data?
Big Data means the collection of large data sets in terms of Terabytes, PB, ZB, etc. According to information published, Facebook stores, accesses, and analyzes more than 30 Petabytes of user generated data. Wal-Mart handles more than 1 million customer transactions every hour. In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day. More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. YouTube users upload 48 hours of new video every minute of the day. According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day, and has more than 465 million accounts. 100 terabytes of data uploaded daily to Facebook.
If you look at above statistics, the amount of data generated is huge (volume) at a very high rate (velocity) and the kind of data being generated is in the form of structured data (relational data), semi-structured (xml) and most of data generated is real-time and unstructured like documents, text, pdf, media logs, web logs. Hence one can easily understand that our traditional data management system– RDBMS, totally fails to store manage, process such different variety of data at high speed.
Hence, ultimately the questions were raised and the answer to manage large data sets – “Hadoop”, came into the picture.
Big Data technology – Hadoop
Hadoop is open source big data project of Apache Software Foundation. It is used to store and process big data in a distributed environment across clusters of computers using a simple programming model called MapReduce. It can scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes.
Hadoop framework is written in java language.It gives solutions not only to process data but also to capture, transfer, store and distribute data over the thousands of machines of commodity servers as a part of Hadoop clusters.
HDFS (Hadoop Distributed File System), core part of Hadoop, stores large data sets of any size (PB, ZB) in terms of data blocks over the clusters. It is rather a data service, thereby makes data always available. Though any machine as a part of the cluster is failed, you can dynamically remove the machine from cluster and add new machine to the cluster. Replication mechanism is the key of HDFS.
MapReduce Framework is another core part of Apache Hadoop which is used to process the data locally on each machine over the cluster. MapReduce has two tasks, first is Map tasks and Reduce task. Map tasks run parallel on all the machine where data is stored and reduce tasks is used for aggregating the data from all map tasks. Hence, reduce tasks always run after the map tasks.
Today, due to the growing importance of Big Data across all sectors, Many major IT giant like Google, Apple, Facebook, Amazon, Oracle, IBM, Adobe, Cisco, and Accenture are using Hadoop and looking for Big Data – Hadoop professionals.
Below are Different job profiles;
1. Hadoop Developer
2. Data Scientist
3. Big Data Engineer
4. Data Visualization Developer
5. Business Intelligence (BI) Engineer
6. BI Solutions Architect
7. Analytics Manager
In order to know more, write your queries in the feedback section below.