Data Enthusiast | ETL | Data Warehousing | Business Intelligence | Analyst | Software Developer
January 20, 2014
Modelling a Data Warehouse
When designing a model for a data warehouse we should follow standard pattern, such as gathering requirements, building credentials and collecting a considerable quantity of information about the data or metadata. This helps to figure out the formation and scope of the data warehouse. This model of data warehouse is known as conceptual model. General elements for the model are fact and dimension tables. These tables will be related to each other which will help to identity relationships between them. This design is called a schema and is of two types: star schema and snowflake schema. The designing of these schema falls under physical model design (Jones and Johnson, 2010).
Before designing the physical model, logical model should be designed this is based on the conceptual model. Logical model mainly focuses on granularities arrangement, data refinement and the definition of logical relation pattern (Fu-shan, 2009).
Granularity refers to “the level of detail or summarisation of the units of data in the data warehouse”. The low level of granularity contains high level of detail and the high level of granularity contains low level of detail. This is one the major issue of data warehouse design as it affects greatly to the data and its query (Inmon, 2005). A diverse category of analytical processing uses various levels of granularity. The level of granularity affects database performances. Data warehouse consists of several combinations and details of data commonly referred as granularity. If the DW has many levels of explorable data layers, it is supposed to be more granular. Generally conventional database operations are categorized as low granular, Whereas modern data warehousing operations are required to be more granular because of the needs of exploring data in several intensity, Thus in a DW environment, granularity directly represents the richness of data quality and consequently establish the intensity of database queries (Fu-shan, 2009).
It consists of single fact table at the centre linked with a number of dimension tables. OLAP focuses in the fact table and data related to facts are stored in dimension table. The dimension tables will not be in normalised form (Wang et al., 2005). The fact table contains the primary key of all dimension tables. The advantage of implementing star schema is that to get information we need simple join queries. But the disadvantage is that for complex systems, it becomes somewhat complicated to read and query a massive amount of data (Jones and Johnson, 2010).
It’s a modification of star schema. The dimensional hierarchy is presented clearly by normalisation of the dimension tables which will be used for drill-down and roll-up operations. Its advantage is that maintaining the dimension tables will be easy (Wang et al., 2005).
The implementation of these schemas depends on specific system requirement. Difference is in performance and usability.