⭐Big Data : An Umbrella Term⭐

Vrukshali Torawane
4 min readSep 16, 2020

--

We all use various social media applications like Instagram, Facebook, Twitter, Netflix and many more…..

Have you ever imagined that images, videos that we upload everyday, where this data is stored ?? The Data is very large in amount something around petabytes of data is uploaded everyday so how these big companies would be managing such a huge amount of data ?? 🤔🤔

💠Lets have a look on some companies per day data usage :

1️⃣ FACEBOOK : Facebook generates 4 petabytes of data per day — that’s a million gigabytes and 2.5 Billion Pieces Of Content and the Facebook like button has been pressed 1.13 trillion times. 100 million hours of video are watched on Facebook every day. Every 60 seconds, 317,000 status updates; 400 new users; 147,000 photos uploaded; and 54,000 shared links. Facebook Gets Over 8 Billion Average Daily Video Views.

2️⃣ INSTAGRAM : The Instagram explore page is viewed by 200 million accounts daily. More than 50 billion photos have been uploaded to Instagram so far. 95 million photos and videos are shared on Instagram per day. 300 million users use the “stories” feature daily.

3️⃣ NETFLIX : There are 1 million Netflix subscribers. Netflix users stream 97,222 hours of video every minute.

And many more such type of examples. How these companies are managing such a huge data?? Now, here the concept of 🔰BIG DATA🔰 comes in role.

💠Now first let us understand what actually Big Data is ??

First of all Big Data is not a technology. Big Data is actually an umbrella of problems in data world. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, information privacy and data source. Big data has main sub-problems volume, variety, and velocity.

  1. Volume : The quantity of generated and stored data.
  2. Variety : The type and nature of the data.
  3. Velocity : The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.

💠If there are problems there are solutions too…👍🏻👍🏻

💠So to solve this issue mainly known as 3 V’s we use a concept named DISTRIBUTED STORAGE. 💠

🤔 What is DISTRIBUTED STORAGE??

➡️ Distributed Storage is an infrastructure that splits a huge amount of data into small blocks i.e., to divide it into independent physical servers across more than one data center.

➡️ This solves the above issues i.e., since we are splitting the data into small blocks so the volume gets reduced and since we are storing data in parallel this saves our time and also reduces the input/output issue i.e., velocity. More and more independent servers the less time required to store data. And the data stored is permanent.

➡️ The topology that we use is MASTER/SLAVE MODEL. Master/slave is a model of asymmetric communication or control where one device or process (the “master”) controls one or more other devices or processes (the “slaves”) and serves as their communication hub.

➡️ The above whole team is known as cluster. So it is known as DISTRIBUTING STORAGE CLUSTER. To implement any concept we need a software. The software that we use to Distribute Storage is HADOOP.

What is Hadoop ??

Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.

So , now we got an idea about Big Data and how the MNC like Google , Facebook etc. solve the challenges of Big Data .

!! Thanking you all for visiting my article !! 😊😊

🔰Keep Sharing!! Keep Learning !!🔰

--

--

Responses (1)