Posts Tagged ‘Oracle’

R Wins?

Robert A. Muenchen at the University of Tennessee wrote a blog entry entitled “Will 2015 be the Beginning of the End for SAS and SPSS?.”   In it, he projects that R will overtake SAS and SPSS as the tool of choice by analysts in 2015 based on it’s popularity with professors and college students.  It is a rational argument.

Based on my experience working around Silicon Valley, I would posit that R is the tool of choice for startups who are often staffed by recent college students and their professors.

However, I also have one foot in the enterprise IT world where SAS and SPSS are well entrenched.  There is so much corporate investment in these products that it will be costly to replace them.  That isn’t to say that the IT buyer won’t be considering alternatives when their annual renewals hit the budget.

But – there will be other considerations besides costs.   SAS and SPSS are both surrounded by software environments for the entire data lifecyle.  SAS, for example, is attempting to expand it’s footprint in the enterprise, with new visualization software, in memory and grid computing that act as datamarts, and new software for the extraction, transformation, and loading of data.  Like IBM, SAS wants to offer an entire analysis ecosystem making it more attractive to the enterprise IT shop.  The open source ecosystem for R is not that mature.  If it was easy for open source to penetrate the enterprise, wouldn’t MySQL (the open source relational database) be the corporate standard?

However, a large company like Oracle could be the point of integration for all things R, with price points that are attractive – similar to what they are doing with MySQL.  That would give R the tailwind from the academic community and coporate respectability.

In the meantime, it will likely be a case of using the best tool for the job/budget.  To quote a commenter (sorry I don’t remember where I saw your comment):

“I learned R to graduate, SPSS to get a job, and SAS to make a living”

Categories: Uncategorized Tags: , , ,

The Next Big Thing – SQL And Hadoop And NoSQL and NewSQL?

In the last post we talked a little about the different approaches taken by traditional database vendors, primarily the relational database vendors, and the proponents of new Hadoop and NoSQL data stores.   What does this mean for you?

On the one hand, your IT infrastructure has a mature robust capability for the care and feeding of table driven (primarily relational) databases.  Relational databases are NOT going away.  (Oracle alone reported selling $4.49 billion in database and middleware licenses, updates, and support last quarter).  BUT, relational databases may not be a fit for “big data” environments where storage capacity, retrieval speed, and low cost are the primary requirements.  It is hard to compete with the perception of “free” –  Hadoop and many of the NoSQL alternatives are based on “free” open source software.   The vendors of these products sell enhanced versions, supported versions, and services related to the installation and support of those data stores.  Because of this, the Hadoop and NoSQL markets are relatively small (estimated at $77 million in 2011 by IDC) .

The relational market is evolving to handle larger volumes of data.   Teradata pioneered the concept of replacing the file structure underlying the relational tables with a distributed file structure within a server chasis using specialized hardware.   Companies like EMC with their Greenplum database have figured out how to distribute relational data across lots of commodity servers.   Most of the big relational database companies are finding ways to incorporate Hadoop into their offerings so that an analyst can use their existing tools to combine data from the table environment with unstructured.  They want you to keep buying database licenses.

The most likely scenario for most enterprises is that Hadoop and NoSQL will be used to store and stage data for analysis, similar to the way that today’s data warehouse is used to store and combine data for data marts where analysis takes place at a department level.  It’s the Operational Data Store in the table below.

Operational Data Store

Enterprise DataWarehouse


Data Quality and Transformation,

Business Intelligence and Analytics Where Detailed Historical Data Required

Business Intelligence And Analytics For Enterprise Needs

Business Intelligence and Analytics For Departmental and Line of Business Needs




Hadoop and NoSQL


Relational/Cubes/Specialized Data Structures

Basically we are building a data refinery to handle all new volumes and varieties of input.   It is an incremental cost to your organization, and will have to be justified for the business.

We will also see hybrid combinations of the best of both the relational and NoSQL models sometimes referred to as NewSQL, for example NuoDB.  The approach used by NuoDB of combining the best of relational and NoSQL is getting endorsements from the likes of Gary Morgenthaler (a co-founder of Ingres and Illustra) and Mitchell Kertzman (the former CEO of Sybase).

In the end, business needs should drive the analysis to be performed, and the needs of the analysis to be performed should drive the information technology required to store and process the data.   Don’t’ get caught up in the hype around “The Next Big Thing.”