Hi Aurora,

I am currently working on a feature that allows for health checks to be
disabled temporarily for a running instance of a job.  The code review can
be found at https://reviews.apache.org/r/26383/.  The idea is that the
presence of a special "snooze file" in the task's sandbox will trigger the
disabling of the health checks.

Currently, the code reviewers have split off into two camps:
1. One set of reviewers believe that simplicity is key.  Disable the health
checks if the snooze file is present, enable it otherwise.

2. The other set of reviewers believe that there should be a snooze
duration.  The timer starts when the snooze file is touched.  After the
snooze duration is exhausted, the snooze file should be deleted by the
health checker, and health checks resume.  This is useful if the process
that initially disabled the health checks dies unexpectedly, and is no
longer there to re-enable the health checks.

I would like to invite anyone interested to voice your opinions and chime
in.

Thanks,

David Pan

Reply via email to