I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Sunday, February 18, 2007

Data Warehouse Error Taxonomy

Aparna Chamerthi & Vijay K Nadendla
The code performing any changes to the source data is tagged with the error type that it fixes. This is part of ETL meta data.
· Incomplete errors
o Missing records
o Missing fields
o Records or fields that, by design, are not being recorded
· Incorrect errors
o Wrong (but sometimes right) codes
o Wrong calculations, aggregations
o Duplicate records
o Wrong information entered into source system
o Incorrect pairing of codes
· Incomprehensibility errors
o Multiple fields within one field
o Weird formatting to conserve disk space
o Unknown codes
o Spreadsheets and word processing files
o Many-to-many relationships and hierarchical files that allow multiple parents
· Inconsistency errors
o Inconsistent use of different codes
o Inconsistent meaning of a code
o Overlapping codes
o Different codes with the same meaning
o Inconsistent names and addresses.
o Inconsistent business rules
o Inconsistent aggregating
o Inconsistent grain of the most atomic information
o Inconsistent timing
o Inconsistent use of an attribute
o Inconsistent date cut-offs
o Inconsistent use of nulls, spaces, empty values, etc.
o Lack of referential integrity
o Out of synch fact data