On 23 March 2018 at 22:14, Shane Curcuru <a...@shanecurcuru.org> wrote: > A question was raised today about how we check if bits of organizational > data that Whimsy processes are consistent or valid. > > Obviously Whimsy itself is not the canonical source of data; we usually > suck it in from Infra-supported tools and simply cache it in more > convenient forms. But it might be interesting - and allow for > experimentation - to add some data integrity checking into whimsy > tooling. This would be a best-effort warning of data issues, not a > comprehensive solution. > > Is this interesting enough that people want to work on it? If so, what > would be a minimum interesting check to add for our main data? > > https://whimsy.apache.org/public/ > > The framework that occurs to me is to add any simple data check methods > inside the various /www/roster/public*.rb scripts that are the cron jobs > that create /public/*.json files. > > We could add a validate_data(json) method to most that - after the > normal processing is complete - could do any checking desired. If a > problem is found, then call a variant of public_json_common.rb > sendMail() that sends an alert about the issue. > > Sound useful?
Some of the public json jobs already do some checks. For example, that LDAP group members are in the LDAP people group. Here is a sample warning: https://lists.apache.org/thread.html/3a0fd03a64cee0c9f5773b17d749d5e3fe33ea8d6c9e75d3372fe13c@%3Cnotifications.whimsical.apache.org%3E It's not always easy to separate the checks from the processing, so having a separate validate_data function may not always be the best solution. In the case of public_ldap_auth_groups, the check is done at the end of the run, but only if the output has changed. This avoids generating too many warnings. > -- > > - Shane > Director & Member > The Apache Software Foundation