What is the big deal about Big Data?

Imagine a typical day at the office for any business executive. All of us would be checking our emails, getting into meetings, attending to our phone calls and planning for that weekend getaway to unwind from a stressful week. Sounds familiar, doesn’t it? These activities will lead to a source of unstructured data being generated, data which until some years back was really difficult to store, much less analyse.

What is Big Data?

The industry has termed all these disparate sets of data as Big Data. Big Data is a collection of data sets which either is unstructured and poses a challenge in storage and retrieval or is such a large set of data that it cannot be analysed by existing systems. Dan Kusnetzky of ZDNet has put it succinctly when he wrote about Big Data

In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities.

Big Data could be a series of images that are used in preparing a report, it could be a video recording of an event, and it could also be a behavioural data of users on your web product.

Is Big Data just Hype?

Do we at the end of this article simply write off this jargon … or are there real life applications of Big Data in organizations? The applications of unstructured data needs one to take a step back and look at the bigger picture –

  1. A customer searches for a particular product on an e-commerce site.
  2. The site has the product, but it is currently out of stock.
  3. He tweets about this to the site’s support on Twitter.
  4. They follow-up using Twitter and finally email him about the availability of stock.
  5. The individual goes ahead and purchases the product, he also manages to buy some other gifts for his family.
  6. The site sends him an SMS and email confirmation.
  7. The order is dispatched and the site sends him a separate SMS about this. The order is sent into two separate dispatches with two separate dispatch IDs.
  8. Both the dispatches reach the happy customer. He goes on and thanks the support on Twitter.

If you see this entire cycle, the interactions and data points are across a series of different systems. Some of these systems are not even owned by the e-commerce site, some of them are vendor systems, yet there is a clear need for a better understanding of these data points.

In order to be more responsive to the potential customer, to reduce the lead to cash times, to invest in the right inventory and to justify Social Media expenditures, the organization (our e-commerce site) will have to analyse this data.

Apart from the obvious business applications, big data is also being used to make society safer. For example, in 2009 Google was able to apply big data to search terms to help identify how the H1N1 virus was spreading through communities. They did such a great job at this that not only was their tracking real time but also it was more accurate than the Centres for Disease Control and Prevention (CDC).

The Next Big Challenge

Any business analyst will tell you the importance to have clean, structured and codified data in order to cull out business insights. Without proper codification on a data set, a business analyst cannot run different statistical tests to check their hypotheses.

With Big Data the challenge is that statistical packages such as SPSS, R or the giant in this field SAS cannot handle really high volumes of data (we are talking data of the size of exabytes here!). Multiple problems will have to be addressed and solved to capture the full potential of big data. Policies related to security of data, privacy and ownership of data and intellectual property rights will also need to be formulated.

Understanding the nature of the problem is the first step to solving it. The next step would be start developing methodologies and systems. In the race to develop the next set of software applications, software giants such as Oracle, IBM and even Infosys are looking at developing their own propriety systems.

Summary

For any new technology to be adopted in main stream implementations, it has to go through a chain of events. It will be widely discussed, it will be touted as the next elixir of life, it will also see some failed implementations and then written off as just another snake oil. In the course of these events, some enlightened organizations would end up doing the right implementations, then success stories will be shared and finally it will be adopted by all organizations as a main stream technology.

Big Data has a long way to go, and it is here to stay.