I am often asked usually by programmers - What is Data Warehousing & how do I learn it? I explain to them we use all the same tools that you do but differently. That’s when I coined the term Data Sense. It describes the essence of Data Warehousing and separates Data Warehousing from rest of Programming. Every aspect of IT from Hardware / Software infrastructure to Design, Development and QA is done with massive data flows and need for data precession accuracy and meaning.

Sunday, February 18, 2007

ENTERPRISE DATA Architecture Standards

Examples of Data Architecture standards to aid in standards identification..These are not proposals but rather a list of standards in use in other Organizations.

Data Architecture
Principle: 1 Design the enterprise Data Architecture so it increases and facilitates the sharing of data across the enterprise.
q Sharing of data greatly reduces data entry and maintenance efforts.
q Data sharing requires an established infrastructure for widespread data access. This includes integration with the Application, Componentware, Integration, Messaging, Network, and Platform Architectures.
q Consistent shared data definitions ensure data accuracy, integrity, and consistency.
q Data sharing reduces the overall resources required to maintain data across the enterprise.

Data Architecture
Principle: 2 Create and maintain roles and responsibilities within the distributed enterprise Data Architecture to facilitate the management of data. This requires a working relationship between the business user organizations and information services(IS).
Business responsibilities are to:
q Provide accurate business definitions of data.
q Develop enterprise-wide business views of shared data.
q Provide business drivers to support centralized data administration.
q Make metadata available.
q Define security requirements for data.
IS Responsibilities are to provide a robust technical infrastructure that includes:
q Open, accessible, and adaptable database management systems (DBMSs).
q Centralized data administration.
q Data replication facilities.
q Backup and recovery.
q Security.
q Database monitoring tools.
q Data quality monitoring tools.
q Application mechanisms for helping to ensure accurate data input.

Metadata
Principle: 3 When designing or modifying a database, review the Metadata Repository for existing standard and proposed data elements before implementing a new database to ensure data elements are defined according to Metadata Repository standards.
Design reviews are essential to ensure that shared firmwide data is defined consistently across all applications. Design reviews also determine whether data that already exists is consistently defined and not redundantly stored. Design reviews should document the following:
q Where is this application getting its data?
q What other applications are getting data from this application?
q Is data used by this application defined consistently with firmwide definitions? If not, is there a plan to define the data according to enterprise definitions?
q A design review evaluates the data requirements of a project and identifies the following:
q A data requirement that can be solved by using existing metadata element.
q Data not already identified as metadata must be proposed as an inter-agency or firmwide standard to the Metadata Element Review Team to become metadata.
q Access is available for application development projects to reference the metadata repository in order to actively research data requirements. Review the existing standard and proposed data
elements in the metadata repository before implementing a new database to ensure data elements are defined according to standards.
q Key information about data is stored in the systems that are already implemented in the firm. If possible, evaluate existing systems to propose firmwide data elements.

Data Modeling
Principle: 4 Take the Entity-Relation (ER) model to the third normal form, then denormalize where necessary for performance.
q The third normal form is the most commonly recommended form for the ER model.
q In some cases, a denormalized database can perform faster as there can be fewer joins, or reduced access to multiple tables. This process saves both physical and logical input and output requirements.

Data Modeling
Principle: 5 Restrict free form data entry where possible.
q In the design phase, consider the values that may be input into a field. These values or domains should be normalized so that data is consistent across records or instances. For example, using consistent values for gender or address information.
q Use look-up tables and automate data entry for column or attribute domain values to restrict what is entered in a column.

Data Access Implementation
Principle: 6 Validate data at every practical level to ensure data quality and avoid unnecessary network traffic.
q Validation can be coded into multiple tiers of the n-tier architecture to ensure that only valid data is processed and sent across the network. For example, an invalid field entered in a data entry form can be corrected before data is written to the database.
q Data integrity verification rules should be used when possible.

Data Access Implementation
Principle: 7 Design the data access infrastructure to support the transparency of the location and access of data by each application.
q This means designing an N-tier architecture where all data access is managed through a middle tier. This design makes databases easy to relocate, restructure, or re-platform the back end services with minimal disruption to the applications that use them. It is essential for adaptive systems..
q A client should not send SQL requests directly to a server. Instead of using SQL code, the client should communicate with the database through data access rules. The application receives a request from a client and sends a message to the data access rule. The data access rule sends an SQL call to the database. With this method, the client does not send SQL to the server, it sends a request for work.

Data Access Implementation
Principle: 8 For data quality management, implement tools, methods, processes and policies to provide high-level data accuracy and consistency across distributed platforms.
q Both business users and Information Technology (IT) staff are responsible for data accuracy and consistency. Policies and procedures must be established to ensure the accuracy of data.
q IT staff is responsible for and must provide security mechanisms to safeguard all data under IT control. The business users must determine functional security requirements, while the physical security must be provided by IT.
q Applied systems management provides safeguards against data loss and corruption and provides the means of recovering data after system failures. This implies that effective backup and recovery systems are imperative and that data can be recovered in a timely basis regardless of the cause of loss.
q For critical functions, plan for survivability under both normal operations and degraded operations.

Data Security
Principle: 9 Record information about users and their connections as they update and delete data. Auditing can determine who updated a record and their connection data.
The information that can be captured by the application includes:
q The user account the user logged in with.
q The TCP/IP address the connected user's workstation.
q The certificate information (if using certificates) about that user.
q The old values that were stored in the record(s) before the modification.
q The new values that were input to the record(s).

Data Security
Principle: 10 Protect database servers from hardware failures and physical OS
attacks.
q Database servers must be located in a climate-controlled, restricted-access facility, and preferably a fully staffed data center. Uninterruptible power supplies (UPSs), redundant disks, fans, and power supplies must be used.

Data Warehouse
Principle: 11 Perform benchmarks on the database design before constructing the database.
q Expect to make changes and adjustments throughout development.
q Changes during the early cycles up to, and including implementation, are a primary mechanism of performance tuning.

Data Hygiene Tools
Principle: 12 Ensure data entry quality is built into new and existing application systems to reduce the risk of inaccurate or misleading data in OLTP systems and to reduce the need for data hygiene.
q Provide well-designed data-entry services that are easy to use (e.g., a GUI front end with selection lists for standard data elements like text descriptions, product numbers, etc.).
q The services should also restrict the values of common elements to conform to data hygiene rules.
q The system should be designed to reject invalid data elements and to assist the end user in correcting the entry.
q All updates to an authoritative source OLTP database should occur using the business rules that own the data, not by direct access to the database.
q Attention to detail should be recognized and rewarded.