January 20, 2014

Modelling a Data Warehouse

When designing a model for a data warehouse we should follow standard pattern, such as gathering requirements, building credentials and collecting a considerable quantity of information about the data or metadata. This helps to figure out the formation and scope of the data warehouse. This model of data warehouse is known as conceptual model. General elements for the model are fact and dimension tables. These tables will be related to each other which will help to identity relationships between them. This design is called a schema and is of two types: star schema and snowflake schema. The designing of these schema falls under physical model design (Jones and Johnson, 2010).
Before designing the physical model, logical model should be designed this is based on the conceptual model. Logical model mainly focuses on granularities arrangement, data refinement and the definition of logical relation pattern (Fu-shan, 2009).
Granularity refers to “the level of detail or summarisation of the units of data in the data warehouse”. The low level of granularity contains high level of detail and the high level of granularity contains low level of detail. This is one the major issue of data warehouse design as it affects greatly to the data and its query (Inmon, 2005). A diverse category of analytical processing uses various levels of granularity. The level of granularity affects database performances. Data warehouse consists of several combinations and details of data commonly referred as granularity. If the DW has many levels of explorable data layers, it is supposed to be more granular. Generally conventional  database operations are categorized as low granular, Whereas modern data warehousing operations are required to be more granular because of the needs of exploring data in several intensity, Thus in a DW environment, granularity  directly represents the richness of data quality and consequently establish the intensity of database queries (Fu-shan, 2009).
Star Schema:
It consists of single fact table at the centre linked with a number of dimension tables. OLAP focuses in the fact table and data related to facts are stored in dimension table. The dimension tables will not be in normalised form (Wang et al., 2005). The fact table contains the primary key of all dimension tables. The advantage of implementing star schema is that to get information we need simple join queries. But the disadvantage is that for complex systems, it becomes somewhat complicated to read and query a massive amount of data (Jones and Johnson, 2010).

Snowflake Schema:

It’s a modification of star schema. The dimensional hierarchy is presented clearly by normalisation of the dimension tables which will be used for drill-down and roll-up operations. Its advantage is that maintaining the dimension tables will be easy (Wang et al., 2005).
The implementation of these schemas depends on specific system requirement. Difference is in performance and usability.
References:
1) Jones, J. and Johnson, E. (2010). Data Modelling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/ca-erwin-db-modeling-bus-intel-....
2) Jones, J. and Johnson, E. (2010). Beyond the data model: designing the data warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/part3_beyond_dmd_data_warehouse...
3) Fu-shan, W. (2009). Application Research of Data Warehouse and its Model design. The 1st International Conference on Information Science and Engineering. pp. 798-801
4) Inmon, W. H. (2005). Building the data warehouse. Fourth Edition. Wiley Publishing, Inc., Indiana: Indianapolis.
5) Wang, J., Chen, T., Chiu, S. (2005). Literature Review on Data Warehouse Development. IACIS Pacific 2005 Conference Program, National Chengchi University, Taiwan. pp. 987-994.

No comments: