Data Integration and Cleansing Environment

Keywords: business analytics, data cleansing, data integration, record linkage

Affiliation: University of Vienna, Faculty of Computer Science

Area of Application

For assessment of data quality in business analytics, metadata describing substantive properties of the data are of utmost importance. In particular one needs many times information on how representative the data are or about methods of data collection, knowledge which goes beyond the information in the data base scheme. By combining ideas from statistical metadata management and business workflow management DICE offers an environment which allows to compute metadata for new data in a warehouse obtained by a data integration activity.

Abstract

The basic idea of the approach is to process metadata simultaneously with the data, i.e. DICE defines besides database operations like joins corresponding metadata operations which update the data description. Besides the definition of the corresponding populations represented by the integrated data an important topic is keeping track of missing values and documentation of missing values occurring in connection with the operation process.