I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Sunday, February 18, 2007

ETL Template

Aparna Chamerthi & Vijay K Nadendla
ExtractData Flow:

Data Sources
Data Server
Stage
Data
Warehouse
File Sources
Data Mart
Extract
Load
Transform
Aggregate
Aggregate
Extract
Aggregate
Aggregate











· BusinessName_ETL 0
o Truncate Stage Fact table
· BusinessName_ETL 1
o Copy Dimensions & Lookups from Data Warehouse to stage
[Loop start for Real time data warehouses. Steps outside loop done once a day]
o Load input data from data sources and file sources to Stage one day (or period) at a time
· BusinessName_ETL 2
o Update the Dimension tables on stage with lookups / dimension data from data / file sources
o Find Dimensions keys and update the stage fact table (fat table). If a certain key is not present in the dimension table but a corresponding transaction present in fact table insert a skeleton record into dimension table.
o Calculate and Populate aggregate tables.
o Make and verify data audit reports
· BusinessName_ETL 3
o Copy back dimensions to Data Warehouse
o Copy back facts and aggregates to Data Warehouse
[Loop end for Real time data warehouses. Steps outside loop done once a day]
o Archive Stage
o Make and verify Data warehouse Audit Reports
· BusinessName_ETL 4
o Make aggregates on data warehouse
o Extract and aggregate to dependent Data marts
o Run further aggregates on dependent Data marts
o Populate Summary / Report tables on Data Warehouses and Data Marts
o Run the Data Extracts from the data warehouse.
· Data Warehouse Live for querying
· Back up all Data Warehouses and Data Marts (once a week)
· Clean up old data in Data Warehouses and Data Marts (periodically)
· Clean up Data Server (Periodically)