January 20, 2014

Modelling a Data Warehouse

When designing a model for a data warehouse we should follow standard pattern, such as gathering requirements, building credentials and collecting a considerable quantity of information about the data or metadata. This helps to figure out the formation and scope of the data warehouse. This model of data warehouse is known as conceptual model. General elements for the model are fact and dimension tables. These tables will be related to each other which will help to identity relationships between them. This design is called a schema and is of two types: star schema and snowflake schema. The designing of these schema falls under physical model design (Jones and Johnson, 2010).
Before designing the physical model, logical model should be designed this is based on the conceptual model. Logical model mainly focuses on granularities arrangement, data refinement and the definition of logical relation pattern (Fu-shan, 2009).
Granularity refers to “the level of detail or summarisation of the units of data in the data warehouse”. The low level of granularity contains high level of detail and the high level of granularity contains low level of detail. This is one the major issue of data warehouse design as it affects greatly to the data and its query (Inmon, 2005). A diverse category of analytical processing uses various levels of granularity. The level of granularity affects database performances. Data warehouse consists of several combinations and details of data commonly referred as granularity. If the DW has many levels of explorable data layers, it is supposed to be more granular. Generally conventional  database operations are categorized as low granular, Whereas modern data warehousing operations are required to be more granular because of the needs of exploring data in several intensity, Thus in a DW environment, granularity  directly represents the richness of data quality and consequently establish the intensity of database queries (Fu-shan, 2009).
Star Schema:
It consists of single fact table at the centre linked with a number of dimension tables. OLAP focuses in the fact table and data related to facts are stored in dimension table. The dimension tables will not be in normalised form (Wang et al., 2005). The fact table contains the primary key of all dimension tables. The advantage of implementing star schema is that to get information we need simple join queries. But the disadvantage is that for complex systems, it becomes somewhat complicated to read and query a massive amount of data (Jones and Johnson, 2010).

Snowflake Schema:

It’s a modification of star schema. The dimensional hierarchy is presented clearly by normalisation of the dimension tables which will be used for drill-down and roll-up operations. Its advantage is that maintaining the dimension tables will be easy (Wang et al., 2005).
The implementation of these schemas depends on specific system requirement. Difference is in performance and usability.
References:
1) Jones, J. and Johnson, E. (2010). Data Modelling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/ca-erwin-db-modeling-bus-intel-....
2) Jones, J. and Johnson, E. (2010). Beyond the data model: designing the data warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/part3_beyond_dmd_data_warehouse...
3) Fu-shan, W. (2009). Application Research of Data Warehouse and its Model design. The 1st International Conference on Information Science and Engineering. pp. 798-801
4) Inmon, W. H. (2005). Building the data warehouse. Fourth Edition. Wiley Publishing, Inc., Indiana: Indianapolis.
5) Wang, J., Chen, T., Chiu, S. (2005). Literature Review on Data Warehouse Development. IACIS Pacific 2005 Conference Program, National Chengchi University, Taiwan. pp. 987-994.

January 16, 2014

Dimensional Modelling

There isn’t any specific standard to model data warehouse. It can be built either using the “dimensional” model or the “normalised” model methodologies. Normalised model normalises the data into third normal form (3NF) whereas dimensional model collects the transactional data in the form of facts and dimensions. Normalised model is easy to use as we can add related topics without affecting the existing data. But one must have good knowledge of how data is associated before performing specific query, so it might be difficult to handle. Reporting queries may not execute as well because massive numbers of tables may be involved in each query. Dimensional model is very efficient to use for non experts and performs pretty well as data is classified in a logical way and similar types of data are stored together. But while adding new topics whole warehouse must be reprocessed (Jones and Johnson, 2010).
Dimensional model is designed to optimise decision support query function in relational databases, where as normalised model is designed to eliminate redundancy problem of the data model, retrieving data which contains identifiers and therefore optimise online transaction processing (OLTP) performance (Firestone, 1998). Therefore it can be said dimensional model is the best modelling method in data warehousing. 
Data Cube
OLAP stores data in arrays which are the logical presentation of business dimensions. Multidimensional array represents intelligence of data elements relationships because analyst’s views depend on it. This multidimensional array data model is called “Data Cube” (Kirkgiize et al., 1997). It consists of facts and dimensions instead of rows and columns as in relational data model. Facts are the accurate or numeric data of business activity and dimensions are the sets of attributes that put facts into context (Wang, Chen, Chiu, 2005). Dimensions are interconnected in hierarchies, for example, city, state, region, country and continent. Figure 1 shows three dimensional views data cube of sales data (Kirkgiize et al., 1997).
In data cube each cell contains one or several values called measures or metrics. It helps to analyse aggregated facts and the level of detail is directly proportional to number of dimensions in the cube. Here axes represents dimension and the space represents facts (Wang et al., 2005). Above shown figure of data cube contains three dimensions namely Time, Geography and Product. Each cell consists of (T, P, G) and business measure is the total sales. Here the amount of product P along with total sales sold in geography G in time period T is stored. Figure 2 shows hierarchy of dimensions. 

Features of data cube:

Slice and dice

The user can generate the exact view of the data required with the help of data cube. This process is called slice and dice because the data is evaluated according to a subset alignment. The subset can also be rotated and figured as a parent one. For example, to find out the sales in a particular place in the specific time, slice and dice operation can be operated in the cube. This will help to choose arrangement along each dimension, it may be time, geography or product according to user needs (Kirkgiize et al., 1997).

Drill-down and roll-up

Data cube may contain hierarchical dimensions. This hierarchical dimension provides data manipulation and detailed analysis for various levels of the dimensions. Moving up and down in a hierarchy is known as roll up and drill-down respectively. After choosing the level of dimension, it can be sliced and diced. According to user need, they can drill-down and roll-up to view the data either in region wise, in product wise or in city wise level (Kirkgiize et al., 1997).
References:
1) Jones, J. and Johnson, E. (2010). Data Modelling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/ca-erwin-db-modeling-bus-intel-....
2) Jones, J. and Johnson, E. (2010). Beyond the data model: designing the data warehouse. CA Erwin, pp. 1-9. [Online] Available from:http://www.ca.com/files/whitepapers/part3_beyond_dmd_data_warehouse...
3) Firestone, J. M. (1998). Dimensional Modeling and E-R Modeling In The Data Warehouse. Dimensional Modeling and E-R Modeling In The Data Warehouse, White Paper No. Eight. pp. 1-9.
4) Kirkgiize, R., Katie, N., Stolba, M. Tjoa, A. M. (1997). A Security Concept for OLAP. Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97), Institute of Software Technology (IFS). pp. 620-626.
5) Wang, J., Chen, T., Chiu, S. (2005). Literature Review on Data Warehouse Development. IACIS Pacific 2005 Conference Program, National Chengchi University, Taiwan. pp. 987-994.

January 13, 2014

Data Warehouse Architecture

According to Weisensee et al., Data warehouse architecture follows following principles:
  • Data Sources
  • Data Warehouses
  • Data Marts
  • Publication Services

Extraction, Transformation and Loading (ETL):

ETL process is the foundation of BI. Success and failure of BI projects depends upon ETL process. It plays a vital role to integrate and enhance the worth of data. After the extraction, cleansing and arrangement of data, it will be loaded into data warehouse. In short, ETL is the transferring process of data from data source to the target data warehouse. Design of ETL process will determine its efficiency, flexibility and maintainability.
ETL process consists of five modules as shown in figure above (Jun et al., 2009). This process will occur between:
Data sources and the data warehouse and
Data warehouse and data marts (Weisensee et al.).

Data Sources:

In an organisation, there will be huge amount of data sources located in different database management systems (DBMS) which may be from Marketing Campaign System, Sales Tracking System, Customer Support System and other systems (Ramsdale, 2001).
These data sources must be physically and logically designed to identify wide view of organisational data review. Physical design will help to review DBMS tables and data sets that will be used by business processes. The logical design will help to make relationship among different tables of physical design through linkages to maintain information hierarchies for presentation and validation purpose. These processes will help in business intelligence (Weisensee et al.).

Data Warehouse:

A typical architecture of data warehouse was designed by Wang et al., 2005. Figure 6 shows data warehouse architecture with its components. It includes:
  • Tools which extract data from various operational databases and external sources; clean, transform and integrate the data; load data into the data warehouse; and regularly refresh the warehouse to reflect updates at the sources and to amend according to changes
  • The data warehouse and data marts
    OLAP servers which help to stores and manage data. Then transforms multidimensional view of data to a various tools like query, analysis, report and data mining tools
  • Repository for storing and managing organisational metadata.
  • Tools for monitoring and administering the whole system.
Jones and Johnson, 2010 has differentiated data mart and data warehouse. Data marts stores data associated to a subset of an organisation such as a branch or particular product. On the other hand, a data warehouse stores data associated to entire organisation. So, it can be said that data warehouse combines the data from data marts.

Online Analytical Processing (OLAP)

There is an analytical processing technology between data warehouse and tools called Online Analytical Processing (OLAP). It is based on multidimensional model of business data in the data warehouse to generate business information. So, OLAP is referred as Analytical Processing or Dimensional Analysis (Kirkgiize et al., 1997). Multidimensional analysis is the manipulation of information through various related dimensions to ease analysis and understanding of the original data (Wang et al., 2005).
Reinschmidt and Francoise, 2000 defines OLAP as  “a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.” It is the multi-dimensional analysis of data that supports logical and navigational querying of data. It helps the user to produce organisational information through relative personalised presentation and analysis of historical and proposed data of different models.

Metadata

Metadata is the information of the data which includes data types, mapping of fields, acceptable values, etc. of the tables. It also sketches the picture of how transformation of data will take place along with how the operation should be managed (Reinschmidt and Francoise, 2000).

Data marts:

A data mart is a subset of data warehouse. It is valuable for particular organisational unit, branch or user. It may contain historical, summarised and probably complete data stored from operational systems (Reinschmidt and Francoise, 2000).  

Publication Services:

These services are used by the business intelligence clients to interact with the available data mart or data warehouse data. It may be in Microsoft Office. (Weisensee et al.).

References:
1) Weisensee, D., Matthews, E., McInnis, A. Implementing Data Warehousing and Business Intelligence at McMaster University Using the SAS Intelligence Value Chain. McMaster 
2) Jun, T., Kai, C., Yu, F., Gang, T. (2009). The Research & Application of ETL tool in Business Intelligence Project.International Forum on Information Technology and Applications. pp. 620-623.
3) Ramsdale, C. (2001). The virtual Data Warehouse:a phase one Busiiness Intelliigence solutiion. Computer Corporation of America. [Online] Available from: http://www.cca-int.com/pdf/Analytics_white.pdf.
4) Wang, J., Chen, T., Chiu, S. (2005). Literature Review on Data Warehouse Development. IACIS Pacific 2005 Conference Program, National Chengchi University, Taiwan. pp. 987-994.
5) Jones, J. and Johnson, E. (2010). Data Modelling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/ca-erwin-db-modeling-bus-intel-....
6) Jones, J. and Johnson, E. (2010). Beyond the data model: designing the data warehouse. CA Erwin, pp. 1-9. [Online] Available from: http://www.ca.com/files/whitepapers/part3_beyond_dmd_data_warehouse...
7) Kirkgiize, R., Katie, N., Stolba, M. Tjoa, A. M. (1997). A Security Concept for OLAP. Proceedings of the 8th International Workshop on Database and Expert Systems Applications (DEXA '97), Institute of Software Technology (IFS). pp. 620-626.
8) Reinschmidt J. and Francoise A. (2000). Business Intelligence Certification Guide. International Business Machines Corporation. Available from: http://www.redbooks.ibm.com/redbooks/pdfs/sg245747.pdf. (Accessed: July 9, 2010)

January 10, 2014

Success factor of the data warehouse and business intelligence implementation

Data warehousing project is a complicated task that demands goals and resources from both business and technical departments. It is expensive but normally a basic project. If it is done by non experts and non skilled support, it can be an expensive and may cause project failure. Several business analysts believed most of the data warehousing projects are unsuccessful to meet their proposed objectives (Furlow, 2001).
Hwang and Xu (2007) have chosen eleven success factors:
Clearly defined business needs
Top Management Support
User involvement
High quality data source
Proper development technology
Enough IS staff and consultants
Project management
Practical Implementation Schedule
Proper planning/Scoping of project
Enough financial support
Measurable business benefits
Articlesbase (2010) has shown several factors that affect the implementation process of a business intelligence system. They are:
Business process and requirements.
ROI (Return on Investment).
Project management and resource assurance.
Company executives support
Take time to plan up front
Enough training to staff and change management.
References:
1) Furlow, G. (2001).The Case for Building a Data Warehouse. Data Warehousing.  IT Pro. pp. 31-34.
2) Hwang M. I. and Xu H. (2007). The effect of implementation factors on data warehousing success: An Exploratory Study.Journal of Information, InformationTechnology, and Organisations, 2. pp. 1-14.
3) Articlesbase.(2010). Success factors of business intelligence implementation. Available at:http://www.articlesbase.com/training-articles/success-factors-of-bu.... (Accessed: 6 August, 2010)

January 7, 2014

Business Intelligence Architecture

According to the Asghar et al. (2009), Business Intelligence (BI) is divided into two main parts: (a) BI dimension and (b) BI process. Knowledge, functionality, technology, business and organisation are categorised under BI dimension. The performance of data sources, data warehousing, ETL, OLAPS and other related tools are categorised under BI process. Basically, dimensions and processes are interrelated to form a complete life cycle of BI system development.

Knowledge

Knowledge is a prime thing for business process. It is also a requisite for creating an innovative idea about new products and technologies, volume of scales, making new clients and keeping relations with clients. The main sources of knowledge in an organisation are clients, suppliers, information systems, news, internal and external data, website, commercial information, inner credentials, media press, workers and stakeholders.  On the other hand, knowledge has been categorised as a practical, expressive, semantic, periodic, unambiguous and implicit. So it can be said that knowledge plays a vital role in any organisation to fulfil the demand of BI (Olszak and Ziemba, 2003; Simmers, 2004).
Functionality
Olszak and Ziemba (2003) signify that the BI benchmark must be taken in consideration to analyse all the benefits that will be generated in an organisation. They are:
Strategic planning for development of an organisation, relationship between enterprise strategy, mission, goals and tasks, identifying problems to be undertaken and providing information for different activities involved in an organisation.
Customer relationship improvement
  • Analysis of  products and services
  • Analysis of efficient and effective procedures for operation
  • Analysis of transaction flow (Cash inflow and outflow)

Technology

According to Olszak and Ziemba (2003), business intelligence includes the knowledge creation, data source and information technology. To achieve this, today there are different platforms which integrate data sources, advanced databases and analysis server with user friendly front-end application. This has made easy for all level of the business staff to analyse the business performance. Some of the business solutions are Oracle databases, IBM DB2, Microsoft SQL Server, Microsoft SQL Server Business Intelligence Studio, Oracle Analytical work manager, Oracle BIEE ( Business Intelligence Enterprise Edition), NCR teradata warehouse, Hyperion, SAS, Cognos, Business Objects, Open source project Pentaho and MS Miner. To choose proper business solution is a difficult task. It can be chosen on the basis of function, architecture design, compatibility and business operation and requirement (Asghar et al., 2009).

Business

Simmers (2004) identifies that the increasing practices of business is directly proportional to high demand of information. These demands are fulfilled by BI through various internal and external surroundings.

Organisation

It is very important to know about the organisation and its goals before going through the BI solution. The present and future demands for knowledge in an enterprise are based on these goals (Olszak and Ziemba, 2003).
References:
1) Asghar, S., Fong, S., Hussain, T. (2009). Business Intelligence Modeling: A Case Study of Disaster Management Organization in Pakistan. Fourth International Conference on Computer Sciences and Convergence Information Technology. pp. 673-679.
2) Olszak, C. M. and Ziemba, E. (2003).Business Intelligence as a Key to Management of an Enterprise. Informing Science.pp. 855-863.
3) Simmers, C. A. (2004). A Stakeholder Model of Business Intelligence. Proceedings of the 37th Hawaii International Conference on System Sciences. pp. 1-9.

January 4, 2014

Benefits of Data warehouse and Business Intelligence

The prime benefit of data warehousing is simplicity. The presentation of data in data warehousing is a single image. This single image is made by collecting data from different department of the organisation. Due to this, time for production and operation of data reduces and thus simplifies the decision making as well. This reduction of time to access data also leads to increase in production and effectiveness. Data warehouse will also help to enhance the function of operational systems. It means that there won’t be any hamper in the regular process of the system even if user will operate any other operation rather than normal one. As data warehouse has technical infrastructure, it is flexible and scalable in nature and supports for both technical and business changeable environment. It gathers raw data as well as manages the flow of information which helps the user to make decisions because of its appropriate information (Furlow, 2001).
Business intelligence is responsible for transcribing raw data into information and knowledge. This will create an effective decision making and strategic process. For any business, value of business intelligence is primarily discussed as an information process which provides some basis for making some basic changes in a specific venture such as introducing new collaboration, making new marketplace, acquiring new clients, offering new products in the market (Olszak and Ziemba, 2007).
References:
1) Furlow, G. (2001).The Case for Building a Data Warehouse. Data Warehousing.  IT Pro. pp. 31-34.
2) Olszak, C.M. and Ziemba, E. (2007). Approach to Building and Implementing Business. Interdisciplinary Journal of Information, Knowledge, and Management, 2, pp. 135-148.

January 2, 2014

Business Intelligence Value Chain

BI is a group of concepts, methods and procedures for making better business decisions using information from several sources and applying knowledge and presumptions to build up a precise understanding of business. It gathers, manages and analyse the raw data for making good understanding/understandable information for managers and analysts. BI integrates the core information with related ones to perceive major procedures and clarify issues. It can observe business activity and performance, to progress as well as adjust swiftly as situations change, to make healthy decision on indecisive judgements and conflicting information and to identify business opportunities. For this BI expects valuable information which can be developed from data resource.


 If any organisation wants to implement business intelligence then first of all they should understand and clarify about the value of high-quality data resource that supports BI. This high-quality data resource contains data that will be utilised for creating the information and the information engineering process. It helps in the determination and presentation of business information to fulfil the organisational demand. So, high-quality data resource is directly proportional to information engineering process. Thus it can be said that data resource is the base of business strategies. Thus, it can be said that quality of data resource will increase the quality of the value chain to promote the business. The value chain starts from data resource to make information that supports knowledge environment. Then this knowledge environment will act as basics for business intelligence for making business strategies (Brackett, 1999).



A case study methodology was attempted by Brohman, 2000 to dig into the process of data warehouse practice and how it shapes the organisation. According to the qualitative analysis from the case study, a model was illustrated named Business Intelligence Value Chain as shown in Figure 2.



In above shown figure, business intelligence and business value are the main concepts of the model. Business intelligence is the package for data analysis whereas business value is the outcome from data warehouse development and practice.