I am planning to develop an R package to manage all aspects of data quality. I am very experienced in data quality, but fairly new to R. I have tried to find a suitable data quality package, and am surprised not to find much to suit my requirements. Developing the package would be an ambitious effort, involving several contributors (that I have already identified, and who also do not have much R experience yet). So I am seeking some confidence that the effort is worthwhile.
The package will be highly configurable so it can be applied to pretty much any situation, and will implement sophisticated data quality capabilities, including: (a) DEFINITION: integration with a data dictionary (perhaps metaData), and with highly configurable and expressive data quality rules (b) MONITORING & DETECTION: automated data quality monitoring and alerting against any data source. Automatically raise and update quality issues (c) ANALYSIS & ROOT CAUSE: data quality dashboard, alerts, drill-downs, plot trends, including perhaps a machine learning aspect that detects noteworthy events in quality measurements for inclusion in executive reports (d) WORKFLOW: basic data quality management workflow (i.e. implement 'inbox' and 'actions', probably via Shiny) The requirements will be drawn from my professional experience (as interim head of data quality at a global bank), although this project is not sponsored either by my employer or any of my consulting clients. I do, however, expect the package to be of interest to financial service organisations who rely on good quality data for their financial and risk models, and for any other process that relies on good data. To sum up, if anyone can point to a data quality package that means I don’t have to develop one that would be great. Alternatively, any comments of support would also be very useful! David David Twaddell Architector Data Tools Tel: +44 20 3239 1099 | +44 7447 936 984 Web: www.architector.co.uk ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.