Data Integration and Cleansing Environment

Keywords: business analytics, data cleansing, data integration, record linkage

Affiliation: University of Vienna, Faculty of Computer Science

Details

Overview

For an overview on the DICE concepts and on foundations of business analytics see the slides from NEMO 2015.  - download

 

Quick Guide to DICE 1.0 incl. sample models

Download the ZIP-file to get started. The file contains a sample model and a manual which is leading your first steps with DICE. - download

 

DICE 1.0 comprises a three-layer architecture

 

  • The top-most level of DICE is the modelling environment. The main function of the modelling environment is (1) to provide graphical means to design the DICE workflows, (2) to trigger and control the execution of the DICE workflows, and (3) to record and visualize the generated metadata
  • The middle layer is the BI-tier which is based on R, a programming language and environment for statistical computing and graphical display (see https://www.r-project.org/). R receives the executable code (automatically created through interpretation of the DICE workflow models) and performs the required transformations/calculations.
  • The third layer is the data layer. Via its access functions DICE 1.0 is capable of accessing data from social media platforms (e.g. from twitter, see www.twitter.com), from cloud storages like google drive (https://www.google.com/intl/de/drive/), or from local shares.