Intro image

Big Data Challenges & Solutions

In this article I’ll like to share about the challenges faced by Organizations due to Big Data and what is done about them.

There are multiple Big Data challenges. These projects are a normal part of business now. According to the NewVantage Partners Big Data Executive Survey 2017, 95 percent of the Fortune 1000 business leaders surveyed said that their firms had undertaken a big data project in the last five years. However, less than half (48.4 percent) said that their big data initiatives had achieved measurable results. This shows that organizations are facing major challenges while implementing Big Data Strategies. The IDG Enterprise 2016 Data & Analytics Research found that 90 percent of those surveyed reported running into challenges related to their big data projects.

So What is Big Data?

Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Big Data can be explained by 3 V’s:

Volume: Big data is a set of data so large that the organization that owns it faces challenges in storing and processing it.

Velocity: New Data is being generated at a rapid pace and there is a need to respond in real time cause you can’t keep the clients waiting.

Variety: Data can be of any format, It can be an image, audio or just plain text.

So these 3 V’s are the main cause of the challenges faced by organizations.

Major companies run their business because of Data. If they didn’t have a Data Oriented Approach then these businesses wouldn’t exist. So now question arises that how are companies like Facebook, Google etc. storing their Data.

Back in 2012 Facebook revealed that it’s system process 2.5 billion pieces of content and 500+ Terabytes of Data daily. It pulls in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Facebook also revealed that over 100 petabytes of data are stored in a single Hadoop disk cluster

Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.

~ Facebook VP of Engineering Jay Parikh

Now what can be done about this?

Even using expensive resources with Massive Storages will not fix the Velocity or the I/O problem. So that’s where Distributed Storage comes into the picture.

Master Worker Architecture
Master — Worker Architecture

So now this shows the Solution to most of our problems. The Compute and Storage are contributed by multiple Worker nodes to help the Master node in facing the issues due to Big Data.

Advantages of using Distributed Storage

  • We don’t need to invest in real servers.
  • Data stores Permanent.
  • We can store data in one single go, Challenge of Volume is solved
  • All data is stored Parallelly, Challenge of Velocity is solved.

Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines (Workers as shown above) and storing more than hundreds of millions of gigabytes.

Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on Hadoop database, i.e., Apache HBase, which has a layered architecture that supports plethora of messages in a single day.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Best Practices In ML Observability for Customer Lifetime Value (LTV) Models

7 Reasons Why Likert Scale Is The Next Best thing Since Sliced Bread

Quarter4’s Data Game is Clutch

How Progressive Values Guide Data Practices at TargetSmart

How to Recognize a “Data Scientist” vs.

Analyzing Condo Prices In Toronto Using Python

The Science of Everything

‘Online’ Kalman Filters for Streaming IoT Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Maowow

Maowow

More from Medium

Importance of Understanding Domain & Technology in Big Data World

Same Same, but Different — My Journey as a Product Analyst from the 8200 unit to AT&T

First Reaction Review: Big Data: A Revolution That Will Transform How We Live, Work, and Think.

Using Defender Advanced Hunting differently for solving daily common issues