I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Wednesday, August 20, 2008

Microsoft BI and CubeFilesystem

This is my real experience analyzing several Microsoft BI implementations – especially in Financial Institutions.

Programmers have been used to using file system as a place where they store and retrieve data in raw form, and they do significant logic in the code in (c++, Java etc) on top of it to get the required functionality. When given a new toy, MDX they used it exactly the same way – as a filesystem where they can build the logic on; instead of business-dimensional space where they should build the logic in! It is the structure of the dimensional space, and in its various measurements, and in definition of scales, in logic that is embedded in moving measures from scale to scale and in defining measures when all you have is data points that they should be concentrating on!

Not that its wrong. But why do you need to write a program in MDX. You may as well use Java, and aggregate data structures landed to disk if that is how you want to proceed.


Aug 21: Update

The last few blogs have led to some emails from friends asking me if what they are doing isn't apt use of the tool what then is proper use of the tool. I refered them to by previous blogs but promissed them to summarize them here
0) Define the space you are going to work in ie next 123
1) Identify dimensions as entities holding business variables. Is the problem I am looking to measure / model contain this variable as an independent parameter - not is this artifact of data representation. So lookups, data processing artifacts of what ever kind are not dimensions. Next question is are all these variables independent or can they be grouped as abstract entities thereby decreasing my dimensionality - keep in mind world is not 100 dimensional
2) Ensure your dimensions are orthogonal - ie there are no relationships between dimensions. Only relationships allowed are hierarchies within dimensions, and relationships between attributes (e.g. code, desc)
3) Further abstract your dimensions at entity level to reduce dimensionality. It is OK if some attributes are empty. You are not trying to reduce space here.
4) Now you have defined an n dimensional space with multiple scales for the problem. Place the data points in the space.
5) Define measures that represent the observed data at the dimensional hierarchy level of data points.
6) Define the relationship between measures at higher levels of hierarchy and lower / observed levels of hierarchy. These could be simple addition of values in each of the dimensional cells or much more complex formulae.
Now you have defined and placed the data properly in MS BI Cubes. Remember you still need to verify results against data points since error caluculation and propagation is not part of above scheme. You may incorporate it into above but it may be non trivial.
Enjoy