I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Saturday, August 30, 2008

How and why are hundred dimension data warehouses designed? Is it the Birth of Object Dimensional Modeling.

As follow up on my previous blogs on the need for and process to design a simple dimensional model that can satisfy the analysis requirements, I have investigated how and why are 100 dimension monsters built. I acknolwedge conversations with colleagues on this subject.

Why
1) Sponsors of data warehouse want every single attribute from all data sources be stored in the data warehouse. This is the big bang approach - build once for all future. Users also don't want to be told a particuler attribute is not present in the data warehouse after its built. So safest approach is to store all.

2) Users are used to the data elements as they worked with during the business processes. Abstraction into new variables with take away the familiarity and there is reluctance.

3) Designers were never exposed to variable reduction using abstraction and hence don't consider it as an option. Designers have an improper understanding of what a dimension is - they slap dim infront of any source table name and assume its dimension of data warehouse. Further they perform horizontal partitioning of data (very OO)for performance reasons.

How:

1) Source databases' data models are copied into one structure. Each source (or a small set of sources) becomes a subject area.

2) Tables that completely (or very nearly overlap) between data sources are merged.

3) The purpose of each table is investigated. If it is a transaction - it becomes a fact table (2 or more tables can be merged). If it is a reference / lookup / business process and such non transaction table it become a dimension. Conform dimensions are set up between subject areas.

4) Other tables / data are merged into the identified tact / dimension tables. SCD properties are identified and metadata added. Load process metadata process tables / attributes are added.

This process reminds me of the creation of a collage and the end effect is very much the same. A very fragmented, but tightly knit data structure.

As my colleagues mentioned this is the bottom up approach - my interpretation of bottom up approach is different and I'll make it subject of future blog.

My Comments:

1) I look at this as data / database consolidation project not a data warehousing project.

2) I am not sure if there is any advantage of moving to dimensional modeling in such cases. Why not use 3NF in cases where you need to build a complete data repository for the enterprise. I am not sure 100 dimensional data web is more understandable, accurate or faster than 3 NF model. It fact it is the opposite.

3) Some techniques smell similar to ORM. And given the fact that most sources are already in Object models; are we looking at birth of ODM - Object Dimensional Mapping?

4) Data access software and tools should take into account the above structures. Significant ammount of code needs to be written to analyze data in above scenario. This looks very much like stove pipe analysis scenario. Doing stove pipe analysis and maintaining code in MDX is non trivial for example.

Vijay