Data Mining
Overview
?The importance of the resource "knowledge" is increasingly recognized in national economies and companies. The social and organizational conditions for generating and effectively using knowledge will determine competitiveness in the near future. The goal of knowledge-oriented corporate management is to generate knowledge from information and to convert this knowledge into sustainable competitive advantages that can be measured as business success.“
?
(North, 1999)
?
Data Mining?can be understood as the application of sophisticated statistical and mathematical methods or algorithms to extensive databases with the aim of extracting hidden patterns, trends and correlations from the data and using this knowledge profitably in the future (prognosis). Terms such as "Knowledge Discovery in Databases" (KDD), Machine Learning or Predictive Analytics are often used synonymously.
?
Data Mining methods:
- Regression
- Logistic regression (binary, multinominal)
- Cluster analysis: hierarchical and partitional procedures (k-means, PAM, AP)
- Discriminant analysis(LDA, QDA)
- Artificial neural networks: MLP, RBF
- Classification and regression trees: CART, CHAID
- k-NN (k-Nearest Neighbor)
- Support Vector Machines (SVM)?
- singular and multifactural variance analysis (ANOVA)
- Pricipal component analysis (PCA)
- …
Also relevant …
Dealing with missing values (imputation methods and visualization possibilities)
Resampling methods (cross validation, bagging, boosting)
?
Data sources
?
A thesis can also be edited starting from a dataset. Here are a number of possible data set sources:
?
?
Requirements
All topics should have, in addition to the theoretical foundations (i.e., model building and model assumptions), an empirical part in which a real, topic-related data set is analyzed using statistical software (R, IBM SPSS Statistics).
Literature
- Backhaus et al., 2011, Multivariate Analysemethoden – eine anwendungsorientierte Ein?führung, Springer
- Backhaus et al., 2011, Fortgeschrittene Multivariate Analysemethoden – eine anwendungs?orien?tierte Einführung, Springer
- James et al.; An Introduction to Statistical Learning - with Applications in R; 2013; Springer
Download-Link? http://www-bcf.usc.edu/~gareth/ISL/getbook.html - Hastie et al.; The Elements of Statistical Learning – Data Mining, Inference and Prediction; 2009; Springer
- Rencher, Methods of multivariate analysis, 2002, John Wiley & Sons Inc.
- Nisbet et al., 2009, Handbook of Statistical Analysis and Data Mining Applications, Academic Press
- Hand et al., 2001, Principles of Data Mining, The MIT Press
- Runkler, 2010, Data Mining: Methoden und Algorithmen intelligenter Datenanalyse, Vieweg+Teubner
- Bishop, Pattern Recognition and Machine Learning, 2006, Springer
- Fahrmeir et al., Regression – Modelle, Methoden und Anwendungen, 2007, Springer
- Tutz, Regression for Categorical Data, 2012, Cambridge Verlag
- Toutenburg, Lineare Modelle – Theorie und Anwendungen, 2003, Physika Verlag
- Kaufman, Rousseeuw; Finding Groups In Data – An Introduction to Cluster Analysis; 1990; Wiley&Sons
- Breiman et al., Classification and Regression Trees, 1998, Chapman & Hall
- …