The following message takes you much deeper into this fascinating topic.
-----Original Message-----
From: Dr.Vijay Pithadia Ph.D.M.Comm.C.P.S.T. [mailto:vijaypithadia@lycos.com]
Sent: Monday, September 10, 2001 1:22 AM
Data mining & Knowledge Discovery:
Databases In business decisions
Dr. Vijay Pithadia
Doctor of Philosophy [1996 - 99]
Master & Bachelor of Commerce [1991-96]
Electronics Technocrat [1985 - 89]
Academic Staff, Dept. of
Social Work [MSW] Saurashtra University
[Vijaypithadia@sify.com]
ABSTRACT:
Today computerization of many business and government
transactions related to activities and decisions generates the floods of data
by large and simple transaction i.e. tax returns, telephone calls, business
trips, performance tests and product warranty registration are being handled
through computer. For the processing the data now are days many traditional
and statistical methods of data analyses i.e. ad-hoc queries and spreadsheets
are used for to obtained informative reports from data but they can’t give
the knowledge from data. In the present paper how the data mining and KDD
technology can facilitates analyses of the data in order to get the important
knowledge hidden inside the data. The second aim of this study is to awareness
among the Indian Universities Teachers, Industries- Organizations people and
also among software professionals to generate projects and to promote the
technology in business decisions.
Key Words: Data
Mining,Process,Techniques,Finance,Banking,SCM,IIT-K,Kanpur,ISI-C,Kolkata,KDD
Data Availability: Data used in this paper are
available from public sources identified in the study.
I thank Subir Hari Singh, Ministry of Information
Technology, Govt. of India, New Delhi, Roger Barker, Morehead State
University, Kentucky, S. Ganesan, Alagappa University, karaikudi, Mangesh
Koregaonkar, Indian Institute of Technology -B, Mumbai, A.G. Balasubranian,
Goa University, Goa, Gabriel Hawawini, INSEAD, Cedex, Nitin Kumar Jain, Indian
Institute of Technology -D, New Delhi, Deepak Suchdey, President, Rajkot
Management Association, Rajkot, P. L. Bali, Thapar Institute of Engineering
& Technology ,Patiala, C.S.G. Krishnamacharyalu, S.V. University, Tirupati,
and Umesh Makawana, government Engineering College, Gandhinagar for making
meaningful comments and suggestions. I also thank K I Device, A D B Kompany,
Jakarta, Sanjay Mehta, Student of MSW, and Bakul Kakadia, Student of B.E. (IT)
for research coad Juvancy.
[1] Introduction
Since last couple of years a term Data Mining is
being heard from computer professionals. Data Mining [DM] is a new class of
intelligent analytical method having ability to intelligently and
automatically assist humans in analyzing the mountains of data for nuggets of
useful knowledge. Data mining is an iterative process of extracting
interesting knowledge from data in large databases. Where knowledge could be
rules, patterns, regularities, relationships, constraints etc. Secondly
knowledge should be valid and potentially useful and third the hidden
information in the data that is useful. Where as KDD is the over all process
of finding and interpreting knowledge from data.
The subject goal is extracting knowledge from data in
context of large databases and to make patterns/ Knowledge in understandable
forms to human beings in order to justify a better understanding of the
underlying data. The emerging technology KDD having a multi step process which
uses Data Mining Methods [Algorithms] to extract [Identify] what is hidden
knowledge in the data according to specifications of measures. Thus data
mining underlying prediction on similar groups of data and Description
involves findings human interpretable patterns describing the data in business
and industry from Financial Management, Marketing Management, and Economic
Surveys of companies to Insurance, Banking and maintenance areas of Business.
[2] Basic Steps of KDD Process
Few of the basic steps of KDD process are discussed
here;
[1] Problem Analysis: It is based on manual
procedure. The main function is to understanding application domain and
requirements of user related to developing prior knowledge for domain.
[2] Selection of Target data: Creating target data
set and Selecting a data set or its subset on which discovery is to be
performed by automatic way.
[3] Data Processing: The third step of KDD process
involves removing noise/ handling missing data based on automatic program.
[4] Transformation of Data: This procedure is made
manually where data reduction and projection are made and finding useful
fields/features/attributes of data according to goal of the problem.
[5] Data Mining: Selection of data mining goal,
choosing method according to task and extracting knowledge and
analyzing/verifying knowledge.
It is based on automatic manner.
[6] Output Analysis and Review: Interpretation and
evaluation the knowledge/ pattern transforms knowledge; rules reports,
automatic usage and follow up for new predictions.
[3] Techniques for Data Mining
For the purpose of Data Mining htere are many
techniques used. Some most popular and commonly techniques i.e. Neural
Networks, Nearest Neighbour Method And Decision Tree are Discussed.
[1] Neural Networks : It is based on non- linear
predictive model and better for Financial Related areas. Some of the sample
systems are OWL (Hyper Logic, USA), Brain Maker ( CSS, USA ) Neuro Shell (
Word Systems Group, USA )
[2] Nearest Neighbor Method: This techniques
classifies each record in a data set based on a combination of the classes of
the K- record/s related to it in a historical data set [ where K is greater
than or equal to 1 ] and therefore it is some times called as K- nearest
neighbor techniques. Sample systems i.e. TiMBL,PEBLS etc.
[3] Decision Tree: A Decision Tree consist of nodes
and branches; beginning node called root. Depending upon the results of a test
the data is classified into various subsets. The end result is a set of rules
with all possibilities.This method is useful in certain algorithms represent
decisions. These decision generates rules for classification of a data set.
Specific Decision Tree method include Classification and Regression Trees
[CART] and Chi - Square Automatic Interaction Detection [CHAID] Sample systems
i.e. Clementine ( Integral Solutions, UK) IDIS ( Information Discover,USA)
ID3, CS.0 ( Rule Quest, Australia) etc.
[4] Data Mining Solutions for Business
The application areas of DM techniques are useful in
business decisions. Some of the potential areas are i.e. Banking, Finance,
Survey’s related to Customer satisfaction, Market, Buying behavior, Customer
characteristics, Economic, Direct Marketing.The details are described below
[a] Financial Market : In the financial market,using
various imperical models of market behaviour,technical analysis for
forecasting price dynamics and selecting the optimal structure of investment
portfolio can be justified.Such systems have special interfaces for laoding
financial data.i.e. Supercharts (Omega Research,USA)wall street money (Market
Arts,USA)etc Data mining methods are also facilitates the analysis and
slection of stocks and other financial instruments.
[b] Banking : In the banking functions such as
mortgage approval,loan underwriting,money lending/borrowing,loyal customer
prediction,stock trading rules identification etc are the important areas for
Data Mining.This system also predict the characteristics of ATM card users who
sale the cards at point of sale.A system can evolve prediction models for
several levels of card usage,based on parameters such as customer age,average
checking account balance,return per month,number of cheques etc.In the case of
mortgage loans data mining system facilitate an excellent set of
discrimination rules by only 8% error rate.The input parameters are account
information i.e. loan source,rates and loan to the value as well as borrower
demographic information.
[c] Database Marketing : In the business world
database marketing is the most successful application.The main functions of
data base marketing are analyses customer data base,find patterns of existing
customer preferences,to target slection of future customers.Many companies are
using database marketing techniques,i.e. American Express reported that due to
database marketing their purchases of credit card is increased by 15-20%.The
possible apllications are Market research including media selection product
segmentation,broadcasting analysis and product success prediction.A system
allows television programming executives to arrange show schedules for
predicting audience share to maximize market share and increase advertising
revenues.
[d] Supply Chain Management (SCM) : The fundamental
operation of retail is the supply chain management,product or services from
the manufacturer to the customer via retail eiter virtual or physical.Data
mining can help viz maximising sales and profits through an optimisation of
marketing actions and providing necessary insights for the retailer to
properly manage customers,promoters,products,stores and employees.Data mining
provides the answers to the question such as: what customer?what products?what
time?and at what price?
[e] Marketing Strategies : Target marketing actions
such as direct mail campaigns are more expensive to produce and inportant is
to find mailing to those individuals most likely to buy.Generating business
models under the various condition is very difficult and complex.The function
of target marketing can be achieved by data mining applications.Examples such
as,Epsilon Data Management,USA handles America’s biggest direct mailers also
including American Express.Marks and Spencer is also using this technique for
direct mail campaign aimed at attracting customers on a suit promotion.
[f] Sales Forecasting : The important use of sales
forecasting is for the optimisation of stocks and purchases.Retails can
predict with accuracy sales as per item and location in order to optimise
level of stocks,on the basis of past data.
This is also important in attracting and keeping the
clients.In germany karstadt retail chain uses a neural networks based system
developed by Neurotec for prediction the sales of total 2,00,000 items carried
in their sotres to optimise order.In london,search space ltd.has developed a
neural networks based application to forecast sales for high street retail
organisation.
[g] Fraud Detection and Prevention : Data mining also
palys an important role in this area.Fraud can be detected in insurance of a
person,tax returns,accounts,credit cards,etc.A system can analyse the
probability that the new account is fraudulent.The probabilities are used to
sort the accounts so that these with highest probability can be further
investigated by fraud analysis.
[5] Indian Players in Data Mining
In India a very few Organization like IIT-B, Mumbai,
IIT-K, Kanpur, Tata Infotech, Mumbai, IBM-India, Banglore and ISI-C, Kolkata
are working toeards this area because cost effective solutions is the major
theme for development of promising technology data mining. IIT-K, kanpur and
IBM-India,Bangalore are working for tools development where as Tata Infotech
also working on the tools and application development includes TULearn,a set
of industrail quality tools to define the nature of database and then to learn
how to classify data into data bases.It consist of Credit card Eligibility
Analysis,Customer satifactory survey,Market survey of Hindustan Lever Ltd.,BPL
Mobile fraud detection etc.ISI-C,Kolkata has been engaged on the
problems:(a)Classification of Archaeological Materials and (b)Market survey of
quality control towards the customer Satisfaction indices. [6] Research Issues
The techniques of data mining is starts as new
emerging concepts and all aspects of this technology are at the research level
shows the developments as well improvement of its efficiency and scalability.
The main issues are discussed below:
[1] It handle multiple source, different kinds of
data i.e. transactional, active, relational, multimedia, object oriented,
legacy etc. [2] Data mining security: Guard against the invasion of privacy.
[3] Interactive Data mining of knowledge at multiple concepts level,
Efficiency and scalability of data mining algorithms, Knowledge at multiple
level in large data bases. [4]Smooth integration with existing databases and
ware housing systems, knowledge updating, application and integration. [5]
Data mining tasks: Summarization, Characterization, Clustering, Trend and
deviation analyses, Classification, and pattern analysis etc.
[7] Conclusion
The application of Data Mining is emerging and
powerful technology for improving business strategies,helping in design of new
products & quality of products. It complements and can often replace the
other business tools i.e. computer reporting and querying,statisfied
analysis.Data Mining have modulation of multiple disciplines such as Database
systems,Data Warehousing and OLAP (Online Analytical Processing), Machine
learning,Information science,statistics,visualisation and other disciplines
such as Mathematical Modelling,Pattern Recognisation,Neural Networks,Image/Signal
Analysis,Web Technology etc. In the busniess decision above all models can
facilitates more suitability to the decision.
Appendix - Tools For Data Mining and KDD
The public domain, commercial system [showed as com]
and research prototype system is shown as pub and some of them are usually
freely available for research purpose.
# Decision Tree Approach:
Pub: LMDT, OCI, PC 4.5, and SE - Learn
Com: AC2, Alice d'I soft, CART, Cognos scenario, KATE
- Tools, Preclass SPSS Answer Tree, Xpertrule Profiler 4.0
# Nearest Neighbor Approach:
Pub: MLC++, PEBLS, and TiMBL 1.0 # Neural Network
Approach:
pub: Neural Network FAQ Free Software , Neuro Net
Site
Com: Neural Network FAQ List, 4 Thought, Brain Maker,
DB Prophet,
INSPECT, Neural Works Predicts, Neuro Solutions,
& SPSS Neural Connections 2
# Rule Discovery Approach:
Pub: Brute, CN2, DB Miner, DB Predictor, FOIL, and
MLC++
Com: Data Surveyor, WINROSA, Data mite, wiz why and
Super Query
# Clustering:
Pub:Autoclass C,ECOBWEB,Fast Fuzzy Cluster,Snob
Com:Autoglass III,COBWEB/3,Cviz Cluster
Visualization,SOMine.
# Statistics:
Pub:XLISP-STAT
Com:BBN Cornerstone,Data Desk,STATlab,SPSS.
# Visualization for Discovery:
Pub:Graf-FX IRIS,VisDB,Xmdv
Com:Cviz Cluster Visualization,DataScope,UPDATE
Sphinx Vision,WinViz.
References
[1] Betttini et.al.(1998),”Discovering frequent
event patterns with multiple granuality in time sequences”.IEEE transaction
on knowledge and data engineering,Vol.10,No.2,March/April.
[2] Cabena et.al.(1998),”Discovering Data Mining
from concept to Implementation “,Prentice Hall,USA.
[3] Chaudhary and Dayal (1996),” Decision
support,Data Warehousing and OLAP”,VLDB.
[4] Fayyad et.al.(1997),”Data Mining and Knowledge
Discovery”– J journal.
[5] Jiawei Han(1996),” Data Mining techniques,a
SIGMOD’96 Conference Tutorial.
[6] Michael Gilmant(1998),” Nuggets and Data
Mining”A white paper,February.
[7] Piatetsky Shapiro (1998),”Data Mining 101”a
white paper, June.
[8] Rakesh Agrawal(1996),”Data Mining
Technologies”,Proc.International Conference VLDB
[9] V.Estivill Castro and A.T. Murray(1998),
“Mining Spatial Data Via Clustering “Proc. International symposium on
spatial data handling-SDH’98 canada,July 11-15