Manage Reference Data Changes with Projects
Author: Calin Groza

January 2, 2010

Having worked on Reference Data Management for a long time, I found the need for a categorization of data from the perspective of change. For example, Telco companies are using Business Support Systems (BSS) and Operational Support Systems (OSS) applications to manage terabytes of data stored in hundreds of tables, XML files and properties files. This data is changing all the time: some often, some rarely; some changes have large impact other have limited impact; the impact could be in the number of customers affected, products sold, financials. There are many aspects to consider about data and its change and each aspect is handled in a specific way.

Depending of how often data changes, we have:

  • Very rarely changing: Customer Type, Order Statuses – these are at the core of the application and while they are present in the data storage layer, it cannot change without a large impact to the application functionality
  • Rarely changing data: Service Type – entities are changing when the organization adds a new line of business, does an acquisition
  • Medium Changing: Product Type information – new products added, discounts, campaigns
  • Often Changing: Customers and Customer-Product information – new customers added to the system; customer changes the products/services provided
  • Very often changing: Customer call records (CDRs) – every time the user makes a phone call a record is added. New data is created every day, every second.

Depending on the impact of the change financially:

  • Large Impact: Changes to the price plans, discounts, campaigns – changes affecting many customers and the overall organization financial situation
  • Small Impact: Customer orders or cancels a service – changes affect only one customer
  • Changes with no financial impact: customer changes the number of rings for a voice mail to be activated – no financial impact on the customer or organization.

Depending on the impact to the service quality:

  • High risk: changing the service delivery platform for a service; changing the voice mail provider – impact to all customers
  • Low risk: modifying a service for a customer

In time, the trend has been to make the data more ready for change and thus providing more flexibility to the organizations and customers. For example, before, customers had to call the call center to make customer information changes, now the customers can change their profile on-line; before, a vendor had to make a change in the application to support a new Service Type, now the organization can make these changes using a configuration tool.

To reduce the risks associated with the data changes, organizations are using specific methods to deal with change.

  1. Changes with low service impact risk and no financial impact – the records in the database contain audit information: who made and when was the last change made. For example, in case of self-managed email configuration the records are updated directly in the RDBS
  2. Changes that have small financial impact to the company and low service level impact are handled through (Customer/Product) Orders. Rather than making directly the change in the database, the user (can be the CSR or self-serve) creates an order that contains the change. The fulfillment of the order results in updates to many tables. The “order” concept is very common for changes that take a longer period of time (from minutes to weeks) and affect many applications
  3. For changes that have a large impact both financially and/or level of service … there is no standard way to handle them. This category includes: Product information, some Service Type changes. This class of change covers “rare” changes but not “very rare” changes. I will provide more details below
  4. Very rare changes – they are handled through the release process of the organization along with the application code.

In the previous list the third category consists of data that is changing rarely, has large financial impact and high risk to the level of service. Examples are Product information, Billing information, messages on the Customer invoice and many other entities. I will call this Reference Data. As more applications become configuration driven, the domain of Reference Data is growing and the need to manage its change it is higher. Another characteristic of Reference Data is that while the data volume is small compared with other entities, the structure is very complex containing many entities with many associations. As an example, an Ordering and Billing system may have 30 tables for data that is changing often but 500 tables for reference data.

One way to manage Reference Data is using the concept of “project.” A Project is similar to an Order in that it captures the change separately from the data and during the “execution” of the project the application updates the actual data elements. A Project is similar to a Software Change in that it impacts many customers and requires testing before execution. But unlike a Software Change it does not require an application release, vendor participation and is triggered by a different team (business vs. development).

The projects have their own life-cycle: get created, modified, tested, deployed in multiple environments and closed. They need to be persisted because they can last a long time: days, weeks. Tools are required to do all this. I will call it generically Reference Data Management Tool (RDMT). What is the scope of this tool and what characteristics of Projects need to be considered?

RDMT needs to store Project information. I mentioned before that the schema of Reference Data is very complex (500 tables with many associations in a large Ordering/Billing System). That means that is not practical to store the data in a normalized RDBMS. The tool and its persistence layer need to work with dynamic data in which the meta-data is handled along with the data as input to the tool rather than known a priory. The common implementations are: BLOBS in RDMBS and XML files stored in some form of a repository. One variant that I used is to use store the Reference Data in XML files. The advantage is that I could store both XML and XSD data, and also have support for versioning (will provide more details below).

RDMT must support the ability to diff/merge reference data. RDMT manages an off-line copy of the reference data and must create the diff between the modified data and the original. Also, if there are multiple instances of the RDMT tool there is a need to import/export projects. This will also require ability merge between the changes of a project into the data. Finally, when multiple projects have been created, it is useful to merge them into a final change before deployment. This helps identifying conflicts between changes.

RDMT needs to have the ability to support multiple users modifying same data. Some tools, simplistically, limit the access to an object to a single user. This creates major operational headaches because there is always a chance of collisions between users/projects/releases. A more practical approach is to allow multiple users to modify the same data and support the “merge” between changes made by different users. The later model is very common in source code management and I consider it the better approach for reference data as well.

RDMT must handle meta-data changes. The Reference Data schema is changing often by adding new entities, attributes and associations. The vast majority of the changes are backward compatible. The tool must be able to support these changes without a DBA team supporting it.

I built an RDMT application based on the project model described above. Here are some highlights of the implementation:

  • Reference Data is stored off-line in XML format. The repository for the files can be Shared Folder or SVN
  • the tool manages the projects on the client machine and has the typical capabilities of a version control system: checkout, modify, diff, checkin reference data entities. It supports an optimistic model for the access to shared entities: all users can modify any data but when the change is checked in, if somebody else modified an object the tool does automatic merge based on precedence rules
  • when a project is checked in it is also closed. At that time the change is ready to be deployed in one or multiple environments. At the same time all other active projects can synchronize with the changes made by the closed project.
  • the tool compares generic business entities based on the meta-data definitions (XSD files). See http://www.integrationspace.com/articles/entity-key-in-xml for ways to identify business entities in XML. I will write an article about business identifiers and diff algorithms
  • · … many other capabilities to support a large integrated environment.

No Comments »

No comments yet.

Leave a comment

Categories