Aparna Chamerthi & Vijay K Nadendla
The code performing any changes to the source data is tagged with the error type that it fixes. This is part of ETL meta data.
· Incomplete errors
o Missing records
o Missing fields
o Records or fields that, by design, are not being recorded
· Incorrect errors
o Wrong (but sometimes right) codes
o Wrong calculations, aggregations
o Duplicate records
o Wrong information entered into source system
o Incorrect pairing of codes
· Incomprehensibility errors
o Multiple fields within one field
o Weird formatting to conserve disk space
o Unknown codes
o Spreadsheets and word processing files
o Many-to-many relationships and hierarchical files that allow multiple parents
· Inconsistency errors
o Inconsistent use of different codes
o Inconsistent meaning of a code
o Overlapping codes
o Different codes with the same meaning
o Inconsistent names and addresses.
o Inconsistent business rules
o Inconsistent aggregating
o Inconsistent grain of the most atomic information
o Inconsistent timing
o Inconsistent use of an attribute
o Inconsistent date cut-offs
o Inconsistent use of nulls, spaces, empty values, etc.
o Lack of referential integrity
o Out of synch fact data
I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.