Aparna Chamerthi & Vijaya K Nadendla
Introduction
Into the second decade of the arrival of the data warehouses, we are realizing that we face the same problems as any other IT initiative mainly related to the end of the product lifecycle. We have to gracefully retire old data warehouses and make the case all over again for new ones. We may have to migrate some of the old data into the new data warehouse. We may also have to migrate the reporting, OLAP or any other BI solutions on top of the data warehouse to new platforms.
In this article we will discuss when to migrate a data warehouse, what are the main problems faced in migrating a data warehouse, how to calculate the ROI for the migration and main technical and scope issues. We will also briefly touch the staffing issues and success criteria.
What is a Data warehouse?
Although formally around for just over a decade, similar initiatives existed for a couple of more decades. As a new and evolving field, our understanding and expectations of it has changed over time. The lack of a strong theoretical underpinning to data warehouse as opposed to the relational databases resulted in adhoc approached based on experience. Hence let us define the data warehouse based on what we expect it to do – it could be one or more of the following.
A repository of data
A source of reports on general performance of business.
A source of data for statistical analysis
A platform for standardization and integration of data
A source of data for CRM, 1-1 marketing, and target marketing.
A data store (the only one incase of web and some scientific fields)
Data structures designed for quick, understandable and easy access to data.
Defining it on what its components are will give us the following list
General understanding, processes and documentation of
a) Various sources of data and their data access mechanisms.
b) Various uses of data
c) Logical and physical platform and procedures for the standardization of data
d) Hardware and software for the data storage, archival, backup and recovery.
e) Hardware and software for data access, including networking & reporting
f) Data quality audit processes
g) Operational processes
h) Meta data cataloging and delivary
i) Interfaces with CRM, Data mining and statistical analysis processes.
j) Interfaces with other data warehouses
k) Security processes and procedures
Thje above contains the elements of many data warehouses. A single data warehouse can contain only some of the above. Some of the above may not be precisely defined. The most common components in the data warehouse are c, d and e.
What is data warehouse migration?
We take data warehouse migration to mean any substantial change in c, d or e. This will cause change in some of all of the other components of the data warehouse listed above. A change of hardware for data storage in by itself will not be called a data warehouse migration.
What to migrate
Migrate the meaning of the data. Not just the data. Migrate the contenet and the context.
Given the volatility of the dta and the businesses how long would the data be useful
How do the applications look at the data.
When to Migrate
• Old system breakdown
• Substantial Data Source changes
• DW Simplification and Reorganization
• Cost reduction
• Technology changes
• Business Process Change
• Business Reorganization
• Performance
Problems
• Employee Reluctance
– Different skill sets
– Job Security
• Transition Period Costs
• Parallel development
• Organizational Inertia
• Downstream Systems
Migrate vs. Modify
• Modification is first preference
• Doesn’t work always
– Increases complexity
– May be impossible
Scope – increase or decrease
Technical Architecture & software reuse
Staffing
Success Criteria
I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.