Posts Tagged ‘Greenplum’

The Next Big Thing – SQL And Hadoop And NoSQL and NewSQL?

In the last post we talked a little about the different approaches taken by traditional database vendors, primarily the relational database vendors, and the proponents of new Hadoop and NoSQL data stores.   What does this mean for you?

On the one hand, your IT infrastructure has a mature robust capability for the care and feeding of table driven (primarily relational) databases.  Relational databases are NOT going away.  (Oracle alone reported selling $4.49 billion in database and middleware licenses, updates, and support last quarter).  BUT, relational databases may not be a fit for “big data” environments where storage capacity, retrieval speed, and low cost are the primary requirements.  It is hard to compete with the perception of “free” –  Hadoop and many of the NoSQL alternatives are based on “free” open source software.   The vendors of these products sell enhanced versions, supported versions, and services related to the installation and support of those data stores.  Because of this, the Hadoop and NoSQL markets are relatively small (estimated at $77 million in 2011 by IDC) .

The relational market is evolving to handle larger volumes of data.   Teradata pioneered the concept of replacing the file structure underlying the relational tables with a distributed file structure within a server chasis using specialized hardware.   Companies like EMC with their Greenplum database have figured out how to distribute relational data across lots of commodity servers.   Most of the big relational database companies are finding ways to incorporate Hadoop into their offerings so that an analyst can use their existing tools to combine data from the table environment with unstructured.  They want you to keep buying database licenses.

The most likely scenario for most enterprises is that Hadoop and NoSQL will be used to store and stage data for analysis, similar to the way that today’s data warehouse is used to store and combine data for data marts where analysis takes place at a department level.  It’s the Operational Data Store in the table below.

Operational Data Store

Enterprise DataWarehouse


Data Quality and Transformation,

Business Intelligence and Analytics Where Detailed Historical Data Required

Business Intelligence And Analytics For Enterprise Needs

Business Intelligence and Analytics For Departmental and Line of Business Needs




Hadoop and NoSQL


Relational/Cubes/Specialized Data Structures

Basically we are building a data refinery to handle all new volumes and varieties of input.   It is an incremental cost to your organization, and will have to be justified for the business.

We will also see hybrid combinations of the best of both the relational and NoSQL models sometimes referred to as NewSQL, for example NuoDB.  The approach used by NuoDB of combining the best of relational and NoSQL is getting endorsements from the likes of Gary Morgenthaler (a co-founder of Ingres and Illustra) and Mitchell Kertzman (the former CEO of Sybase).

In the end, business needs should drive the analysis to be performed, and the needs of the analysis to be performed should drive the information technology required to store and process the data.   Don’t’ get caught up in the hype around “The Next Big Thing.”