Data Enthusiast | ETL | Data Warehousing | Business Intelligence | Analyst | Software Developer
January 13, 2014
Data Warehouse Architecture
According to Weisensee et al., Data warehouse architecture follows following principles:
Extraction, Transformation and Loading (ETL):
ETL process is the foundation of BI. Success and failure of BI projects depends upon ETL process. It plays a vital role to integrate and enhance the worth of data. After the extraction, cleansing and arrangement of data, it will be loaded into data warehouse. In short, ETL is the transferring process of data from data source to the target data warehouse. Design of ETL process will determine its efficiency, flexibility and maintainability.
ETL process consists of five modules as shown in figure above (Jun et al., 2009). This process will occur between:
Data sources and the data warehouse and
Data warehouse and data marts (Weisensee et al.).
In an organisation, there will be huge amount of data sources located in different database management systems (DBMS) which may be from Marketing Campaign System, Sales Tracking System, Customer Support System and other systems (Ramsdale, 2001).
These data sources must be physically and logically designed to identify wide view of organisational data review. Physical design will help to review DBMS tables and data sets that will be used by business processes. The logical design will help to make relationship among different tables of physical design through linkages to maintain information hierarchies for presentation and validation purpose. These processes will help in business intelligence (Weisensee et al.).
A typical architecture of data warehouse was designed by Wang et al., 2005. Figure 6 shows data warehouse architecture with its components. It includes:
Tools which extract data from various operational databases and external sources; clean, transform and integrate the data; load data into the data warehouse; and regularly refresh the warehouse to reflect updates at the sources and to amend according to changes
The data warehouse and data marts OLAP servers which help to stores and manage data. Then transforms multidimensional view of data to a various tools like query, analysis, report and data mining tools
Repository for storing and managing organisational metadata.
Tools for monitoring and administering the whole system.
Jones and Johnson, 2010 has differentiated data mart and data warehouse. Data marts stores data associated to a subset of an organisation such as a branch or particular product. On the other hand, a data warehouse stores data associated to entire organisation. So, it can be said that data warehouse combines the data from data marts.
Online Analytical Processing (OLAP)
There is an analytical processing technology between data warehouse and tools called Online Analytical Processing (OLAP). It is based on multidimensional model of business data in the data warehouse to generate business information. So, OLAP is referred as Analytical Processing or Dimensional Analysis (Kirkgiize et al., 1997). Multidimensional analysis is the manipulation of information through various related dimensions to ease analysis and understanding of the original data (Wang et al., 2005).
Reinschmidt and Francoise, 2000 defines OLAP as “a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.” It is the multi-dimensional analysis of data that supports logical and navigational querying of data. It helps the user to produce organisational information through relative personalised presentation and analysis of historical and proposed data of different models.
Metadata is the information of the data which includes data types, mapping of fields, acceptable values, etc. of the tables. It also sketches the picture of how transformation of data will take place along with how the operation should be managed (Reinschmidt and Francoise, 2000).
A data mart is a subset of data warehouse. It is valuable for particular organisational unit, branch or user. It may contain historical, summarised and probably complete data stored from operational systems (Reinschmidt and Francoise, 2000).
These services are used by the business intelligence clients to interact with the available data mart or data warehouse data. It may be in Microsoft Office. (Weisensee et al.).
1) Weisensee, D., Matthews, E., McInnis, A. Implementing Data Warehousing and Business Intelligence at McMaster University Using the SASIntelligence Value Chain. McMaster
2) Jun, T., Kai, C., Yu, F., Gang, T. (2009). The Research & Application of ETL tool in Business Intelligence Project.International Forum on Information Technology and Applications. pp. 620-623.
7) Kirkgiize, R., Katie, N., Stolba, M. Tjoa, A. M. (1997). A Security Concept for OLAP.Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97),Institute of Software Technology (IFS). pp. 620-626.