Big Data and It’s Challenges

4 min readSep 16, 2020

Data are characteristics or information , usually numerical, that are collected through observation. In a more technical sense, data are a set of values of qualitative or qualitative variable about one or more persons or objects, while a datum (singular of data) is a single value of a single variable.

About store of data — we can store data temporary (at the time of running) in RAM and permanently in Hard Disk. We can see that normally Hard Disks are in Kb’s, Tb’s ….

Why Hard Disk are not in petabyte , exabyte , Zeta byte ,YB…..etc ?

Lets take an example — if In a Hard Disk , 1 g b data takes 1 min. to store then think about 1000 tb data , it’s a huge data (Big Data) so it will take more time.

There is a problem in Big data — volume (size) and velocity (input / output problem)

Big Data: This is a term related to extracting meaningful data by analyzing the huge amount of complex, variously formatted data generated at high speed, that cannot be handled, processed by the traditional system.

Big data can be described by the following characteristics:

Volume

The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.

Velocity

The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.

Data of Multi National Companies like Google , Facebook …

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.

Google currently processes over 20 petabytes of data per day through an average of 100,000 Map reduce jobs spread across its massive computing clusters. The average Map reduce job ran across approximately 400 machines in September 2007, crunching approximately 11,000 machine years in a single month.

The amount of data we produce every day is truly mind-boggling. There are 2.5 Quintillion bytes of data created each day by social media , internet….etc

$ Then how big Multi National Companies like Google , Facebook …store their data?

We sure that Multi National Companies are facing the problem of Big Data.

To solve the problem of big data ,there is a technology named as Distributed Storage.

What is Distributed Storage and how it works .

a big data store relies on “distributed storage.” For distributed storage, instead of storing a large file sequentially, you can split it into pieces, and scatter those pieces across many disks. This illustration shows a file split into pieces, sometimes called blocks, with those blocks distributed across multiple disks for storage of the file. The big data platform Apache

so by the help of distributed storage we can splits our work so that it will increase velocity and decrease the problem of volume .

in above image all the PC’s are called slave (Contribute RAM & CPU) and cloud in a mid known as master (receiver).

Combination of Master + slave is called cluster.

What is Hadoop?

The Apache Hadoop project develops open-source software for reliable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Or we can simply say that Hadoop is a software or program for distributed storage. So that we can solve the big data problem.

Thank you

Big Data and It’s Challenges

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Chirag Nagori

No responses yet