I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Friday, March 23, 2007

What to look for when you hire a data warehouse engineer

  • Full lifecycle experience in Business analysis, design, development and operations of terabyte size data warehouses in star schema. Must have worked on atleast 3 seperate data warehouses.
  • Experience in ETL architecture design and development using industry standard ETL tools (any of ab initio, Informatica, Data Stage or SSIS)Experience in Relational & Dimensional Data Modeling.
  • Ability to reverse engineer and understand existing data models.Familiarity with Database, Software and hardware issues / architecture needed to design and develop a terabyte size data warehouse (On UNIX AND Windows Platforms, ORACLE and SQL SERVER)
  • Understanding of data accuracy and quality and the ability to backtrack and solve data issues
  • Nice to have:Experience in Cube design and development using SSAS, Hyperion or any other industry standard tools.

Sunday, March 18, 2007

Why do Most data warehouses Fail?

  1. Datawarehouse is treated like any other application in the company and same people put it through the same process that worked for other applications.
  2. Lack of understanding of the mess that the application sources to data warehouses are in and an optimistic expectation of data quality.
  3. Lack of understanding of the columes of data involved and the false expectations that the vendors set. Yes SSAS will work for this volume we were told. But in reality a vastly toned down version with 10 % of requirements will work on a 64 cpu box we found after months of work.
  4. .....

Why are programmers creative in creating data subtilities

Every time we touch a dataset we are told "in this case x means y else x means z only if x = 999 or else z means -z if.....". Or in plain english a pink cat is a dog unless its not alive (as indicated by its age) then its a toy and its age is -999.

I was told by one progammer the reason for such convolutions in data and he does have his point. "I write the smallest, easiest to maintain most efficient to run code..... I am not writing extra code to make sure the data is clean and meaningful" (I paraphrased).

Thursday, March 1, 2007

Basics of Dimensional Modeling

A data warehouse answers a particuler set of business questions. The questions will have variables like sales, profit, location. If you think deep you will recognise that there are two types of variables - those that represent or measure the business and those that are parameters that influence thew business. The former are facts and the latter are dimensions. Or to be more clear the former are attributes of Fact Entity (s) and the latter are Dimension Entities. - NOTE the difference in the defnition. This is crucial to arrive at a simple dimensional data warehouse data model. The data model is the heart of the data warehouse - NOT the ETL.