Archive

Posts Tagged ‘metamarkets’

Metamarkets, A Practical Application For In-Memory Data Management

February 21, 2013 Leave a comment

MetaMarkets StackI had the opportunity to talk with representatives of Metamarkets this week about their utilization of an in-memory data store and found their argument for using in-memory data storage for real time (or near real-time) analysis compelling.

Metamarkets is a San Francisco company that provides real time presentation and analysis of event data.   Their first customers are interactive digital marketing companies like the Financial Times who are looking for real time feedback on the performance of advertising placements.  (They hinted that next week they will announce a very large customer using their technology to analyze on-demand video activity.)  Their solution is offered as a service running on Amazon Web Services and on-premise.    Their website posts the following stats: 

  • 300+ billion events ingested and processed per month
  • >100,000+ ad-hoc, multi-dimensional queries executed per day
  • 10+ TB of compressed, memory-mapped derived data
  • 500ms average query response time

The MetaMarkets data stack has interesting parallels to what SAS is doing with its Visual Analytics offering with essentially 4 layers of functionality.  Here is how it looks comparing the two side by side:

  SAS Visual Analytics MetaMarkets
Target Audience Business Users (self service) Business Users (self service)
Visual Presentation Flash JavaScript (proprietary scripts)
Analytics SAS R* (proprietary algorithms)
In-Memory Data Store SAS LASR Server Druid* Columnar Store
Staging / ETL GreenPlum, Teradata, HDFS* Hadoop *

*Open Source

One of the things I find most interesting is how much Hadoop (or HDFS) has become the “store and forward” method for capturing event data for subsequent processing for these vendors and possibly others pitching the equivalent of the “analytic data warehouse.” 

I also think there is some debate about how “real time” the analysis is for Metamarkets, given the latency of a Hadoop ETL layer.

Metamarkets developed Druid internally, and have released it as an open source project.  (They have a respectable following on GitHub, with about half as many followers as Impala from Cloudera, and twice as many as Voltdb as of the date of this post).  Time will tell if they gave away proprietary technology, or if they were smart to outsource development of what will become commodity-like technology to focus on the real intellectual property – the R language algorithms used to make sense of the data for non-data scientists.  I think the latter most likely.

Their business model and data architecture are oriented towards time-series data, but I didn’t see anything in their architecture that would limit them to time-series data in the future.

I think it is amazing what is being accomplished today with open source software.  This is a rich time to be in the analytics business and I look forward to some of the amazing insights to come from the availability of data and modeling capabilities previously available only to well-funded data scientists.

See more about Metamarkets here at DBMS2, and Druid here at Metamarkets.com.