Archive for April, 2013

Data Is The Program

April 18, 2013 1 comment

DataWhen I was in college I studied data structures with Dr. Mary Loomis.  We were learning how to program data structures of all kinds in memory.   At the time, most programs were written with a basic read-a-record / process-a-record / write-a-record sequence.   The same programming sequence would be repeated thousands of times until an end-of-file occurred.   The emphasis for us as programmers was to make the programs as efficient as possible in terms of CPU time and CPU space.

Dr. Loomis taught us the value of separating the structure and the management of data from the actual program logic as a way of creating even greater efficiency.   This approach launched wildly successful companies like Ingres, Sybase, and Oracle.

Did we lose something in the process of making data a subsystem that serves program logic?

One place to look for an answer to that question might be in the “analytics” space, where data scientists are building models around “big” data.   They often spend significant time acquiring and formatting data in a way that fits their specific needs for the models they are testing.   For them, the program “logic” for their operating models is actually in the data itself.  They are either:

  • Exploring the data to find out what is happening in the real world (what the program is)
  • Creating models in their heads that are then tested and refined against the “program” the data contains
  • Adapting models that have worked on other data sets to see if there is a fit.

In this case, shouldn’t the “structure” in the data be telling us what the “program” should be?   Are we cloaking the “intelligence” and “programs” in the data by forcing it into database models that are prescribed for the efficiencies we needed when CPU’s were relatively slow and had very limited memory?

Why It Matters

The new world of in-memory databases can significantly alter the way we think about data as the source for program logic, as opposed to seeing it as only feedstock for pre-defined program logic.

One of the problems with analytics models is that they can be difficult to deploy into a production environment.   They are often defined with statistical programs that are isolated from production environments with no formal process for implementing the results in production without rewriting production systems in some other programming environment, with associated big budgets and long lead times.  New in-memory data structures that can support both operational and analytical needs means that in the future it will be possible to build analytic models using the same data and infrastructure as operations.  This means that robust modeling languages with the capability of defining and executing program logic can be placed in “production” by flipping a switch from test to live.  Your data scientists and business managers could be working in real time to monitor, model, and change your operations to adapt to economic and social changes as they are occurring.

What’s even more interesting is what happens if we follow the line of thinking that the program is in the data.  In other words, what if our models learn and adapt based on the state of all the data and changes in the data?

Today, it is possible for a data scientist to create a decision tree (all the if-then-else logic) that explains something we see in the real world.   For example, this can be very useful for explaining and predicting customer behavior based on historical data.   This week a major airline lost their entire traffic management system for two hours.   There was no historical basis for modeling the impact this had on other airlines, car rental, and lodging companies.  What if their models were instead, learning models that could have adapted within hours or even minutes to the flow of new customer behaviors, creating new capacity and pricing models to leverage the “intelligence” in the data?

Looking for the program logic in the data rather than imposing that logic on the data has potential for changing the role of technology across industries and it will be exciting to see how it unfolds.

Categories: Uncategorized