Data Warehouse: Organizational Data is collected, integrated, Understood and structured for analysis. Data Access is very simple and results repeatable.
Data Gym: Organizational Data is collected and exposed to users through some heavy duty power lifting the sort you find in gym. Most of the logic is developed into programatic layer at data access level. SQL / MDX insted of being used for data access, is used for complex logic that puts serious stress on the machines and leads to performance issues that are addressed through typical programatic techniques.
I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.
Friday, July 18, 2008
Friday, May 16, 2008
Data Warehousing the Cloud.
Cloud Computing is the in thing today. Cloud computing is what its name says – to be able to build functional capability without overall structure – or with over all structure handled by someone else (black box computing may be apt in this case). This is different from outsourcing hardware to a data center. It is an approach to providing business functionality that aims to be faster and cheaper.
Properly structured enterprise application development will need use case analysis of business, design of a relational data model (the solution to legacy data cloud problem proposed by Codd), creation of object layer on top of relational model, and structured screens that lead user through the data entry / selection process. This structured approach ensures the data integrity, functional integrity and hence the business Integrity.
However it is also expensive – not just financially. To understand this look at what happened when structured GUI became Screen Cloud (Web). The innovation and usage skyrocketed while the cost dipped sharply when screen flowing into one another became screens with random links from page to many other pages. However the cloud concept was mostly limited to content and the applications were still structured. The Software and Hardware infrastructure was still structured.
The revolution in inception (if it comes to pass) is in the way IT hardware / software is provisioned and how that is stored to run applications and store and retrieve data. The Infrastructure cloud replaces the IT department with a black box that takes credit card and gives you incremental hardware and software resources to build your applications on. The infrastructure cloud also supports a data cloud i.e. another black box which will replace your Data Architect and DBA. All database maintenance work is automatically taken care of in cloud. The data model will be made non existent using hyper tables that support name value pairs instead of rows and columns. All the left is business logic that needs to be coded by use case in applications.
This is not a pipe dream. This is where we could end up in couple of year’s time. Or NOT. And this doesn’t have to be a failure. Most business can go back to being strict with the use cases they support in their business processes and put strict controls on the data integrity. I.e. if you call the business and if you spell your name wrong too bad you are not in our system and will not be supported. There will be no search functionality to rescue you. Since there is no structured data model in data cloud there is a possibility of different representation scheme ie variables and their relationships for different use cases even within a single application.
In summary – applications with a few use cases can be built and supported very easily and cheaply on the cloud.
Where does that leave data warehousing? We basically investigate, understand and merge data from different sources with different representation schemes (i.e. data models) into one specially designed representation scheme to support analytic efforts. We will likely continue to do that – this time integrating data from different use cases into a coherent analytic data model. That increases scope tremendously but its somewhat offset by using the cloud itself to build data warehouses assuming that the cloud has a special color (dark clouds?) to support different characteristics of Data Warehousing Loads. The ETL tools we use today have to change to support new requirements. Those changes will wait for market to firm itself.
Properly structured enterprise application development will need use case analysis of business, design of a relational data model (the solution to legacy data cloud problem proposed by Codd), creation of object layer on top of relational model, and structured screens that lead user through the data entry / selection process. This structured approach ensures the data integrity, functional integrity and hence the business Integrity.
However it is also expensive – not just financially. To understand this look at what happened when structured GUI became Screen Cloud (Web). The innovation and usage skyrocketed while the cost dipped sharply when screen flowing into one another became screens with random links from page to many other pages. However the cloud concept was mostly limited to content and the applications were still structured. The Software and Hardware infrastructure was still structured.
The revolution in inception (if it comes to pass) is in the way IT hardware / software is provisioned and how that is stored to run applications and store and retrieve data. The Infrastructure cloud replaces the IT department with a black box that takes credit card and gives you incremental hardware and software resources to build your applications on. The infrastructure cloud also supports a data cloud i.e. another black box which will replace your Data Architect and DBA. All database maintenance work is automatically taken care of in cloud. The data model will be made non existent using hyper tables that support name value pairs instead of rows and columns. All the left is business logic that needs to be coded by use case in applications.
This is not a pipe dream. This is where we could end up in couple of year’s time. Or NOT. And this doesn’t have to be a failure. Most business can go back to being strict with the use cases they support in their business processes and put strict controls on the data integrity. I.e. if you call the business and if you spell your name wrong too bad you are not in our system and will not be supported. There will be no search functionality to rescue you. Since there is no structured data model in data cloud there is a possibility of different representation scheme ie variables and their relationships for different use cases even within a single application.
In summary – applications with a few use cases can be built and supported very easily and cheaply on the cloud.
Where does that leave data warehousing? We basically investigate, understand and merge data from different sources with different representation schemes (i.e. data models) into one specially designed representation scheme to support analytic efforts. We will likely continue to do that – this time integrating data from different use cases into a coherent analytic data model. That increases scope tremendously but its somewhat offset by using the cloud itself to build data warehouses assuming that the cloud has a special color (dark clouds?) to support different characteristics of Data Warehousing Loads. The ETL tools we use today have to change to support new requirements. Those changes will wait for market to firm itself.
Wednesday, March 12, 2008
Background of Previous Posts talk
I was working on a data warehouse for a multi product ASP. Faced with multitude of data issues, I became an advocate for data quality at source, slowly being responsible for the data source data architecture too. This talk was written in that phase.
Tuesday, March 11, 2008
Monday, January 28, 2008
Friday, March 23, 2007
What to look for when you hire a data warehouse engineer
- Full lifecycle experience in Business analysis, design, development and operations of terabyte size data warehouses in star schema. Must have worked on atleast 3 seperate data warehouses.
- Experience in ETL architecture design and development using industry standard ETL tools (any of ab initio, Informatica, Data Stage or SSIS)Experience in Relational & Dimensional Data Modeling.
- Ability to reverse engineer and understand existing data models.Familiarity with Database, Software and hardware issues / architecture needed to design and develop a terabyte size data warehouse (On UNIX AND Windows Platforms, ORACLE and SQL SERVER)
- Understanding of data accuracy and quality and the ability to backtrack and solve data issues
- Nice to have:Experience in Cube design and development using SSAS, Hyperion or any other industry standard tools.
Sunday, March 18, 2007
Why do Most data warehouses Fail?
- Datawarehouse is treated like any other application in the company and same people put it through the same process that worked for other applications.
- Lack of understanding of the mess that the application sources to data warehouses are in and an optimistic expectation of data quality.
- Lack of understanding of the columes of data involved and the false expectations that the vendors set. Yes SSAS will work for this volume we were told. But in reality a vastly toned down version with 10 % of requirements will work on a 64 cpu box we found after months of work.
- .....
Subscribe to:
Posts (Atom)