Intro image

Big Data Challenges & Solutions

3 min readSep 16, 2020

In this article I’ll like to share about the challenges faced by Organizations due to Big Data and what is done about them.

There are multiple Big Data challenges. These projects are a normal part of business now. According to the NewVantage Partners Big Data Executive Survey 2017, 95 percent of the Fortune 1000 business leaders surveyed said that their firms had undertaken a big data project in the last five years. However, less than half (48.4 percent) said that their big data initiatives had achieved measurable results. This shows that organizations are facing major challenges while implementing Big Data Strategies. The IDG Enterprise 2016 Data & Analytics Research found that 90 percent of those surveyed reported running into challenges related to their big data projects.

So What is Big Data?

Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Big Data can be explained by 3 V’s:

Volume: Big data is a set of data so large that the organization that owns it faces challenges in storing and processing it.

Velocity: New Data is being generated at a rapid pace and there is a need to respond in real time cause you can’t keep the clients waiting.

Variety: Data can be of any format, It can be an image, audio or just plain text.

So these 3 V’s are the main cause of the challenges faced by organizations.

Major companies run their business because of Data. If they didn’t have a Data Oriented Approach then these businesses wouldn’t exist. So now question arises that how are companies like Facebook, Google etc. storing their Data.

Back in 2012 Facebook revealed that it’s system process 2.5 billion pieces of content and 500+ Terabytes of Data daily. It pulls in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Facebook also revealed that over 100 petabytes of data are stored in a single Hadoop disk cluster

Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.

~ Facebook VP of Engineering Jay Parikh

Now what can be done about this?

Even using expensive resources with Massive Storages will not fix the Velocity or the I/O problem. So that’s where Distributed Storage comes into the picture.

Master Worker Architecture
Master — Worker Architecture

So now this shows the Solution to most of our problems. The Compute and Storage are contributed by multiple Worker nodes to help the Master node in facing the issues due to Big Data.

Advantages of using Distributed Storage

  • We don’t need to invest in real servers.
  • Data stores Permanent.
  • We can store data in one single go, Challenge of Volume is solved
  • All data is stored Parallelly, Challenge of Velocity is solved.

Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines (Workers as shown above) and storing more than hundreds of millions of gigabytes.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on Hadoop database, i.e., Apache HBase, which has a layered architecture that supports plethora of messages in a single day.