Data Warehousing and Data Mining





1. Data warehousing

 

The data ware house is the modern concept of database management system. The term data warehouse is given by W.H. Inmon.

 

 

  • W.H. Inmon:”A subject oriented integrated, nonvolatile, time-variant collection of data in support of management decision is called data warehouse.”
  • Ralph Kimball: Data warehouse is the conglomerate of all data marts within the enterprises. Information is always stored in the dimensional model.
  • A collection of decision support technologies, aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions is called data warehouse.

 

Characteristics of data warehouse:

  • Multi-user support
  • Accessibility
  • Transparency
  • Client-server Architecture
  • Flexible Reporting
  • Generic Dimensionality
  • Multidimensional conceptual views

 

Functionality of data warehouse:

  • Roll-up
  • Roll-down
  • Pivot
  • Slice and Dice
  • Sorting
  • Selection
  • Derived (computed) attributes

 

On the basis of architecture, there are three data warehouse models:

(a) Enterprises Warehouse: An Enterprises warehouse collects all of the information about subjects concerning the entire organization. It provides corporate wide data integration.

 

(b) Data Mart: Data marts are usually implemented on low cost departmental servers. The implementation cycle of data mart is generally measured in weeks rather than months or year.

 

(c) Virtual ware house: A virtual warehouse is a set of views over operational databases. For efficient query processing, only some of the possible summary views may be materialized.

Virtual Ware House

Virtual Ware House


 

Data Model:

 

The foundation of the data warehousing system is the data model. A good data model will allow the data warehousing system to grow easily, as well as allowing for good performance. In data warehousing project, the logical data model is built based on user requirements, and then it is translated into the physical data model:

(a) Conceptual data model: At this level, the data modeler attempts to identify the highest level relationships among the different entities.

  • Includes the important entities and the relationships among them.
  • No attribute is specified.
  • No primary key is specified.

 

(b) Logical data model: At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database. In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

1. Identify all entities.

2. Specify primary keys for all entities.

3. Find the relationships between different entities.

4. Find all attributes for each entity.

5. Resolve many-to-many relationships.

6. Normalization.

 

(c) Physical data model: At this level, the data modeler will specify how the logical data model will be realized in the database schema.

  • Specification all tables and columns.
  • Foreign keys are used to identify relationships between tables.
  • De-normalization may occur based on user requirements.
  • Physical considerations may cause the physical data model to be quite different from the logical data model.

 

Data warehouse Usage:

 

Data warehouse contains integrated and processed data to perform data analysis at the time of decision making and planning. It is a very important tool for business executives.

(a) Information Processing: It supports querying basic statistical analysis, and reporting using crosstabs, tables, charts, or graphs.

 

(b) Analytical Processing: It is helpful in multidimensional analysis of data warehouse data and support basis OLAP (On-Line Analytical Processing) operations (Slice-dice, drilling, pivoting).

 

(c) Data Mining: The data mining is a process of intelligent pattern discovery from data warehouse. It supports associations, constructing analytical models, performing classification and predication, and presenting the mining results using crosstabs, graphs, and other visualization tools.

 

 

2. Data Mining

 

An information extraction activity whose goal is to discover hidden facts contained in databases is termed as data mining. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.

  • Data mining refers to the mining or discovery of new information in the term of pattern or rules from vast amount of data.
  • Data mining helps in extracting meaningful patterns that cannot be found necessarily by merely querying or processing data or metadata in the data warehouse.
  • Data mining is a process of data analysis using powerful analysis tools capable of extracting business intelligence from the large repository of electronic data.
  • Data mining is the result of natural evolution of Information technology in general and Database technology in particular.

 

 

Data Mining Application:

 

Data mining does not replace skilled business analysts or managers, but rather gives them powerful new tools to improve the job they are doing. It is a something out from traditional tracks of decision making and business planning. It offers great promises in helping organizations to uncover patterns hidden in their data that can be used to predict the behavior of customers, products and processes.

  • Biomedical and DNA data analysis: The genetic engineering is the young discipline of engineering which is totally based on the structure of genes. There are 1065 genes are present in human body and a pair of gene is responsible to control any specific characteristics. The genes are present in DNA (Deoxyribo Nuclic Acid) which is made from nucleotides: Adenine (A), Cytocine (C), Guanine (G), and Thymine (T). The gene engineering is boon for person suffering from hereditary disease. After fertilization, sequence of diseases carrying gene in zygote is changed.
  • Image processing: Data mining provides efficient tools for image processing.
  • Financial data analysis: The bank and business organizations are often based on data mining for collection, high quality accuracy, better customer service and satisfaction, loan payment, credit rating etc.
  • Retail Industry: The customers are major objective for any business organization. The products and services are designed to focusing customers. Data mining is helpful in prediction of behavior of customers in market. It is used to identify customer buying behavior, improve customer service, enhance customer and goods ratio, design more effective goods and discover cost effective transportation methods etc.
  •  Manufacturing sectors: Manufacturing section of any organization is dependent on data mining for designing of most acceptable products. The market is the name of competition, if there is no any competition your monopoly help you to obtain high profit, but now a days monopoly can exists not for long times. The data mining helps executive to design customer oriented products.
  • Telecommunication Industry: Telecommunication industries are backbone of any organization. The mismanagement in communication industry can spoil many business organizations, industries, universities, military systems etc, because it does not carry only normal data but also confidential data. In telecommunication industry data mining is used for identifying telecommunication patterns, catching fraudulent activities, making better use of recourses, and improving quality of services.

 

3. KDD (Knowledge Discovery in Database)

 

The KDD is in the expanding process. The term Knowledge is very broadly interpreted as involving some degree of intelligence. The knowledge is classified often as (a) inductive, and (b) deductive

 

What is business intelligence?

Business intelligence usually refers to the information that is available for the enterprise to make decisions on. A data warehousing (or data mart) system is the backend, or the infrastructural, component for achieving business intelligence. Business intelligence also includes the insight gained from doing data mining analysis, as well as unstructured data (thus the need for content management systems). For our purposes here, we will discuss business intelligence in the context of using a data warehouse infrastructure.

The Knowledge discovery process comprises six phases:

  • Data selection
  • Encoding
  • Data cleaning
  • Enrichment
  • Data Mining, and
  • Reporting

 

4. Data mining and warehousing:

 

  • The goal of a data warehouse is to support decision making with data. Data mining can be used in conjunction with a data warehouse to help with certain types of decisions.
  • Data mining can be applied to operational databases with individual transactions. To make data mining more efficient, the data warehouse should have an aggregated or summarized collection of data. Data mining helps in extracting meaningful new patterns that cannot be found in the data warehouse.
  • Data mining applications should therefore be strongly considered early, during the design of data warehouse.
  • Data mining tools should be designed to facilitate their use in conjunction with data warehouses.

 

 

5. Web Data Mining

 

The World Wide Web provides rich sources for data mining. It is a too huge for effective data warehousing and data mining, and too complex and heterogeneous because it has no standard and structure. The WWW is huge, widely distributed, global information service center for

  • Information services: news, advertisements, consumer information, financial management, education, government, e-commerce, etc.
  • Hyper-link information
  • Access and usage information


Related posts:

  1. Data, Information and Knowledge Data, Information and Knowledge are described below individually below: Data:...
  2. Predefined Data Types Predefined data types Data can be of many types (e.g....
  3. What is a Computer? The term computer is derived from verb compute, which meaning...
  4. Application of computer Some of the applications of computers are as follows: (a)  ...
  5. Procedural oriented programming Procedural oriented programming (pop):- A program in a procedural language...