Hadoop, The Big File System

I just saw this quote by John Santaferraro, VP of Solutions and Product Marketing at ParAccel Inc.  I couldn’t have said it better –

“It’s always interesting to hear discussion where Hadoop is positioned as the panacea for big data. I much prefer to adopt an approach that acknowledges a file system approach for what it is and what it does well. File systems like Hadoop are good for capturing data, archiving, filtering, transforming, and doing some batch analytics. Where Hadoop falls down is when companies try to write programs to use a file system to do complex analytics, or to do analytics where the data sources and algorithms are constantly changing. In like manner, there are analytic platforms, built on next generation database technology, that have been built from the ground up to execute high performance analytics on massive amounts of data. While new companies will spring up around Hadoop to visualize what is there, it will be extremely difficult, costly, and time consuming (in years) for companies to figure out how to use a file system to do analytics.

I would spin that a little and say that Hadoop may not be a “panacea” for big data, but it has a featured role as a “relatively” inexpensive store of lots of data you need to have “relatively” rapid access to.  It is not a data-warehouse or data-mart for doing predictive modeling.   All the really smart people working in the Hadoop ecosystem will come up with some wonderful new ways to manipulate data with Hadoop, but you will still need other database and next-generation database technologies to do the actual data manipulation required of advanced analytics.  It’s not an either-or proposition.

