I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Monday, February 19, 2007

How many dimensions should I have in my Data Warehouse?

I can tell how many you should not have - 25, 50 ,100 or more. Because I refuse to believe there are 100 significant entities by which you are analyzing your business. To understand this lets start off with the definition of a dimension.

A dimension is an ENTITY by which you analyze your business. Even 6 months back I used to believe a dimension is a Variable by which you analyze your business. Then I happened to work on projects with more than 100 dimensions tables. I realized the scope for misunderstanding in the definition.

What is an Entity and what is a variable or an attribute? A customer is an entity and a customer name is an attribute. Customer state, Customer Zip, Customer Type are some more attributes of same entity. The Dimension is Customer. You'll have 100 dimensions in your data warehouse only if you create a dimension for each attribute - e.g. Dim_Customer_Type. Or Dim_Tick_Type. Take your 100 + dimensions and make entities out of them. You'll end up with 10 -15 entities or less. The name of the game is to get to right level of abstraction.

When you have 100 + dimensions it becomes impossible to develop build and manage the system. There is no ROI. Your resources don't increase just because you need them. YOur resources are what you can afford. Insted of beating your resources to the ground whether people or machines, proper design and a simple data warehouse that satisfies the requirements should be built.