Views

InformationTechnology:Database:DataWarehouse

Contents

Navigation



Related categories



About this page

We apologize for the little information we provide, this page is still under construction. Please stay tuned.
Image:Construction_worker.gif

Data Warehousing overview

The term "data warehouse" has become one of the most used and abused terms in the IT vocabulary. A data warehouse can be any collection of summarised data from various sources, structured and optimised for query access using OLAP (On-Line Analytical Processing) query tools. A data warehouse is virtually any database containing data from more than one source, collected for the purpose of providing management information. The term "data warehousing" was coined by the mid-1980s and, in essence, it was referring to an architectural model for the flow of data from operational systems to decision support environments. Based on analogies with real warehouses, data warehouses were initially intended as large-scale collection/storage/staging areas for bulk-supplied data, in provenance from the various operational systems. Once stored in the data warehouse, data could be classified and distributed to "data marts" (equivalent to "retail stores") to be accessed by Decision Support Systems (DSS) and users. Along the way this architectural vision changed and the term data warehouse started to be used to refer to OLAP-enabled decision-support databases. The costs of data warehousing projects are usually very high, driven up primarily by the requirement to collect, "clean" and integrate data from different sources, often legacy systems. The cost of extracting, cleaning and integrating data can eat-up to 60-80% of the total cost of a data warehousing project or decision support project. On the tooling side, the costs associated with OLAP tools, data integration technology, data extraction tools, graphical user query tools, etc, represent a small proportion of a project's total cost.

A data warehouse is populated through a series of steps represented by the acronym ETL (Extract, Transform and Load)
  • EXtract data from the source environment.
  • Transform data to have desired warehouse characteristics like subject-orientation and time-variance.
  • Load the data into a target environment.
The ETL process typically takes the longest to develop, and this can easily take up to 50% of the data warehouse implementation cycle. The reason for this is that it takes time to get the source data, understand the necessary columns, understand the business rules, and understand the logical and physical data models.

Once the data is made available in the Data Warehouse, it has to be processed.It is important to distinguish the capabilities of a Data Warehouse from those of an OLAP (On-Line Analytical Processing) system. In contrast to a Data Warehouse, which is usually based on relational technology, OLAP uses a multidimensional view of aggregate data to provide quick access to strategic information for further analysis. A Data Warehouse Architecture (DWA) represents the structure of data, flows, processing and presentation that exists and is made up of a number of interconnected parts.
  • Operational Database / External Database Layer
  • Information Access Layer
  • Data Access Layer
  • Data Directory (Metadata) Layer
  • Process Management Layer
  • Application Messaging Layer
  • Data Warehouse Layer
  • Data Staging Layer


Two of the pioneers in the Data Warehouse field were Ralph Kimball [1] and Bill Inmon [2] and many of the Data Warehouse terms and concepts were coined defined by them.

Data Warehousing resources

Here are some general resources on Data Warehousing
  • Data Warehousing and Business Intelligence resources [3]
  • DM Review's Data Warehousing basics portal [4]
  • ITtoolBox Data Warehouse Knowledge Base [5] [6]
  • The Data Warehousing Information Center [7]
  • Data warehouse articles
    • The ten commandments of data warehousing [8]
    • An Introduction to Data Warehousing [9]
  • The Common Warehouse Metamodel [10]
  • Data Warehousing and OLAP Bibliography [11]
  • Data Warehousing Institute [12]
  • Data Warehousing Online [13]


OLAP products

Choosing an OLAP product among the choice of OLAP tools from different vendors is no easy task because the metadata formats tend to be proprietary. Recent efforts by the Object Modeling Group (OMG) [14] resulted in a Common Warehouse Metamodel specification. The Common Warehouse Metamodel (CWM) describes metadata interchange among data warehousing, business intelligence, knowledge management and portal technologies.

There are 2 types of OLAP
  • MOLAP (Multidimensional OLAP), where a cube is aggregated from the relational data source (data warehouse). When user generates a report request, the MOLAP tool can generate the create quickly because all data is already pre-aggregated within the cube.
  • ROLAP (Relational OLAP), where, instead of pre-aggregating everything into a cube, the ROLAP engine essentially acts as a smart SQL generator. The ROLAP tool typically comes with a 'Designer' piece, where the data warehouse administrator can specify the relationship between the relational tables, as well as how dimensions, attributes, and hierarchies map to the underlying database tables.


OLAP applications are generally characterized by the following common features
  • Multidimensional views of data, inherently representative of an actual business model
  • Calculation-intensive capabilities, often requiring involved algebraic equations, identified by the OLAP Council's APB-1 performance benchmark
  • Time intelligence, as illustrated by examples by OLAP Council's APB-1 performance benchmark


OLAP resources

Here are some resources on the subject
  • OLAP Council [15]
  • Various articles on Data Warehousing by Ralph Kimball [16]
  • The OLAP report [17]
  • Data Warehousing and OLAP links [18]
  • OLAP and Business Intelligence links [19]
  • OLAP resources directory [20]
  • OLAP and Data Mining resources [21]


OLAP products on the market

Here are some popular OLAP products
  • TANGRAM for Business Intelligence, an OLAP product for the desktop [22]
  • Business Objects XI [23]
  • Contour Cube BI [24]
  • Cognos Suite [25]
  • MicroStrategy Platform for Business Intelligence [26]
  • Microsoft SQL Server 2005 [27] [28]


Reporting tools on the market

Two of the most popular reporting toolsets are
  • Crystal Reports from Business Objects [29]
  • Actuate Reporting suite [30]