Archive

Posts Tagged ‘Data Retention’

The Current Big Thing – Big Data

By now you are probably sick of hearing about Big Data.  I know I am.   It’s like a pop song you can’t get out of your head because you hear it everywhere you go.

According to Wikipedia, “big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. “

The fact is we can generate so much information so fast from web sites, social media, automated sensors, communications networks, and other computing related devices that it is becoming increasingly difficult to capture and store the data, let alone analyze it.

The problem with the term “big data” is that the word “big” is ambiguous, and certainly relative to your unique situation.  It kind of reminds me of the argument of what a recession is.  Most people know it when they see it.   They can certainly find lots of evidence of a recession – slow sales, slow economic growth, high unemployment (although to be fair, slow and high are ambiguous).  The economists have a quantitative definition for a recession.   It is two consecutive quarters of negative economic growth as measured by a country’s gross domestic product.

Most IT practitioners could probably describe some of the evidence of a big data problem like frequent meetings about how to archive data to free up disk space, complaints about insufficient historical data to do analysis and modeling, or the simple fact that data is coming in with no place to store it.  Would it be possible to have a quantitative measure to define big data – something like an increase in data inflows and storage needs of more than 10% in each of 2 consecutive quarters?

OK, maybe not, but I would propose that when someone starts talking “big data” we get them to be more explicit about what they mean as it pertains to the business at hand.  How about we quantify the problem, or, better yet, can we spend more time focused on exactly what “Big Opportunities” are presented to justify all the activity around solving a perceived “Big Data” problem.  Here’s the thing – many organizations haven’t been able to capitalize on data warehouses and business intelligence investments.  Just going down the path of the next big thing –  like big data – won’t benefit them until they have the plans, resources, and commitment to capitalize on a big data solution.

Finally, for companies that have a big data opportunity, there will be a host of new considerations around the way they manage meta data (descriptions of what the data represents), data governance (rules about how the data is used), data quality, data retention, etc. that will have a profound effect on type of analysis that can be performed and the reliability of the results.  My intent is to cover some of these in future posts.