Posts Tagged ‘Data Quality’

Practical Master Data Management

December 14, 2012 Leave a comment

We spend a lot of time talking with customers about master data management (MDM).  Here are a few bullets to convey the meat of the conversation.

1. Why MDM?

You didn’t need MDM when you had one application and one database.   The nature of things like customer, product, and supplier were understood by everyone who used the application and the data.  As soon as you built or acquired another application with its own database that referred to the same things, you introduced the potential for a mismatch in the description (attributes) attributed to those things.

For example, the billing system has customer address as a Post Office Box and the shipping system has the physical customer address as the shipping address.  What address should a new marketing campaign use?  If I want to do market segmentation based on zip code, and the zip codes are different, which one should I use?

This example is overly simplified.   The problem is usually more along the lines of the customer having 10 source systems with 10 things each having 50 attributes and no automated way of applying the rules to make the decisions about the “right” attributes to use for 100 million occurrences of those “things.”

2. Master Data And Reference Data Are Different

Some customers get confused about the difference between master data and reference data.   Reference data is something like the table of 2 character state postal codes we use in the USA.  They rarely change and are used by all your applications for consistency.  As we will see, Master data is the result of combining two or more sources to get the “best” combined representation.

MDM Example

3. Master Data Is Operational

In the data-to-day operation of the business, Master Data is about reconciling the difference between two or more systems and having a single version that represents what someone in the company has defined as the “truth” (at least according to the way they use the data.)  You wouldn’t need Master Data Management (MDM) if you changed your operations so that everyone was using the same source data, entering all new data related to the same entity the same way, and all data imported from outside sources conformed to your data quality and business rules.  In real life, few organizations are prepared to go back and change their operations to meet these criteria, so you are resigned to Master Data Management to address the need for a consistent view of the items you master.

4. Data Quality Precedes MDM

The process of matching disparate sources of data to identify master entities is enhanced significantly when the source data is cleansed and standardized prior to applying matching rules.   Strike that – cleaning up and standardizing the data is required before attempting to do MDM.

MDM Example

5. MDM Can Be A Step In A Process

Sometimes the output of an MDM process is used as a way to create a static Master File for input to other systems, or as input to a data warehouse or data mart.  In the latter case, we sometimes do some enhancement as well.  For example, combining attributes from multiple systems into a “wide” master record that has all the attributes from the source systems would be common for later analysis purposes.  Just to be clear, this augmentation is not MDM, it is enabled by MDM.  MDM is about the identification of an entity and clustering the source records that pertain to that entity.

MDM Example

6. Operational MDM Can Be Performed Real Time In A Hub

An MDM Hub is a server that provides MDM services on request.   The same rules developed for data quality, entity identification, and the creation of a “master” record can be performed upon request by a server so that the resulting master record reflects the actual current state of the source systems.  The hub can be used to:

  • verify that an item already exists in one or more systems (potentially eliminating the entry of duplicates)
  • standardize the application of data quality processes
  • share operational “rules” that span systems and processes (if billing and shipping addresses are different, then verify shipping address before committing the order)
  • provide an administrative interface for human intervention and workflow for exception handling

MDM Example

MDM processes and hubs need to be customized for every client situation.  It is an effort that has to involve the entire enterprise.

7. MDM As A Service

In the ideal future state, data quality enforcement and MDM could be standardized and provided as a service to all applications in the enterprise.   Centralizing these functions instead of performing them in different ways in each application could significantly reduce the amount of work that has to take place reconciling differences in the data.   Delivery via an enterprise cloud, or eventually shared services in a public cloud are possibilities.


The Current Big Thing – Big Data

By now you are probably sick of hearing about Big Data.  I know I am.   It’s like a pop song you can’t get out of your head because you hear it everywhere you go.

According to Wikipedia, “big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. “

The fact is we can generate so much information so fast from web sites, social media, automated sensors, communications networks, and other computing related devices that it is becoming increasingly difficult to capture and store the data, let alone analyze it.

The problem with the term “big data” is that the word “big” is ambiguous, and certainly relative to your unique situation.  It kind of reminds me of the argument of what a recession is.  Most people know it when they see it.   They can certainly find lots of evidence of a recession – slow sales, slow economic growth, high unemployment (although to be fair, slow and high are ambiguous).  The economists have a quantitative definition for a recession.   It is two consecutive quarters of negative economic growth as measured by a country’s gross domestic product.

Most IT practitioners could probably describe some of the evidence of a big data problem like frequent meetings about how to archive data to free up disk space, complaints about insufficient historical data to do analysis and modeling, or the simple fact that data is coming in with no place to store it.  Would it be possible to have a quantitative measure to define big data – something like an increase in data inflows and storage needs of more than 10% in each of 2 consecutive quarters?

OK, maybe not, but I would propose that when someone starts talking “big data” we get them to be more explicit about what they mean as it pertains to the business at hand.  How about we quantify the problem, or, better yet, can we spend more time focused on exactly what “Big Opportunities” are presented to justify all the activity around solving a perceived “Big Data” problem.  Here’s the thing – many organizations haven’t been able to capitalize on data warehouses and business intelligence investments.  Just going down the path of the next big thing –  like big data – won’t benefit them until they have the plans, resources, and commitment to capitalize on a big data solution.

Finally, for companies that have a big data opportunity, there will be a host of new considerations around the way they manage meta data (descriptions of what the data represents), data governance (rules about how the data is used), data quality, data retention, etc. that will have a profound effect on type of analysis that can be performed and the reliability of the results.  My intent is to cover some of these in future posts.