Published on Dec 12, 2016
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Ralph states that a data warehouse is "a copy of transaction data specifically structured for query and analysis."A data warehouse is a repository of an organization's electronically stored data.
Description of Data Warehousing
Data warehouses are designed to facilitate reporting and analysis.This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.
A data warehouse can be normalized or denormalized. It can be a relational database, multidimensional database, flat file, hierarchical database, object database, etc. Data warehouse data often gets changed. And data warehouses often focus on a specific activity or entity. Of course if you want to define every user as a decision maker and all activities as decision making processes, then my assertion is false. But in my experience, the overwhelming uses of data warehouses are for quite mundane, non-decision making purposes rather than for grist for making decisions with wide ranging effects (so-called "strategic" decisions.). In fact, I would assert that most of data warehouses are used for post-decision monitoring of the effects of decisions – or, as some people might say, for "operational" issues. By the way, this is not saying that using data warehousing in the decision making process is not a wonderful, potentially high return effort. But my caution is that though the trade press, vendors, and many industry experts trumpet the role of data warehousing vis–à–vis decision making, in reality we do not now have nor will we ever have a clear understanding of decision making.
Data Warehousing arises in an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.
The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handles sales data or production data, yet another system for finance and budgeting data etc. In practice, these systems are often poorly or not at all integrated and simple questions like: "How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills" can be very hard to answer, even though the information is available "somewhere" in the different data systems.
Datawarehousing the source data systems are considered as given: It is not the task of the datawarehousing consultant to figure out, that since the problem is that the CRM system identifies a person by initials, while the Employee-Time-Management system identifies a person by full name while the ERP system identifies a person by social security number; and since a person can change his name: things do not work and the organization should invest in and implement one or two new systems to handle CRM, ERP etc. in a more consistent manner.