I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Friday, May 16, 2008

Data Warehousing the Cloud.

Cloud Computing is the in thing today. Cloud computing is what its name says – to be able to build functional capability without overall structure – or with over all structure handled by someone else (black box computing may be apt in this case). This is different from outsourcing hardware to a data center. It is an approach to providing business functionality that aims to be faster and cheaper.

Properly structured enterprise application development will need use case analysis of business, design of a relational data model (the solution to legacy data cloud problem proposed by Codd), creation of object layer on top of relational model, and structured screens that lead user through the data entry / selection process. This structured approach ensures the data integrity, functional integrity and hence the business Integrity.

However it is also expensive – not just financially. To understand this look at what happened when structured GUI became Screen Cloud (Web). The innovation and usage skyrocketed while the cost dipped sharply when screen flowing into one another became screens with random links from page to many other pages. However the cloud concept was mostly limited to content and the applications were still structured. The Software and Hardware infrastructure was still structured.

The revolution in inception (if it comes to pass) is in the way IT hardware / software is provisioned and how that is stored to run applications and store and retrieve data. The Infrastructure cloud replaces the IT department with a black box that takes credit card and gives you incremental hardware and software resources to build your applications on. The infrastructure cloud also supports a data cloud i.e. another black box which will replace your Data Architect and DBA. All database maintenance work is automatically taken care of in cloud. The data model will be made non existent using hyper tables that support name value pairs instead of rows and columns. All the left is business logic that needs to be coded by use case in applications.

This is not a pipe dream. This is where we could end up in couple of year’s time. Or NOT. And this doesn’t have to be a failure. Most business can go back to being strict with the use cases they support in their business processes and put strict controls on the data integrity. I.e. if you call the business and if you spell your name wrong too bad you are not in our system and will not be supported. There will be no search functionality to rescue you. Since there is no structured data model in data cloud there is a possibility of different representation scheme ie variables and their relationships for different use cases even within a single application.

In summary – applications with a few use cases can be built and supported very easily and cheaply on the cloud.

Where does that leave data warehousing? We basically investigate, understand and merge data from different sources with different representation schemes (i.e. data models) into one specially designed representation scheme to support analytic efforts. We will likely continue to do that – this time integrating data from different use cases into a coherent analytic data model. That increases scope tremendously but its somewhat offset by using the cloud itself to build data warehouses assuming that the cloud has a special color (dark clouds?) to support different characteristics of Data Warehousing Loads. The ETL tools we use today have to change to support new requirements. Those changes will wait for market to firm itself.