> In my experience there can be failures affecting a single host or a
> single cron job where no jobs run at all, or no emails from that host
> get delivered.  In the absence of monitoring, such failures ere not
> noticed until they cause some other symptom.
>
> We want that symptom to be "we get mail from cron job A that cron job
> B has stopped working".  This pattern is one I have used elsewhere and
> I have indeed had some cron mails.
>
> Or to put it another way, each monitoring cron job comes with an
> inherent risk of ineffectiveness because "it didn't run" and "it
> failed but the report went into an oubliette" are indistinguishable
> from "it's working".  A design with only one *monitoring* cron job,
> and a bunch of separate function jobs that actually do something,
> reduces that risk and makes mitigations for it (manual testing) much
> easier.

Okay.

> Secondly, "copy this key out to the central place" is a much simpler
> thing to deploy and will almost never need to be updated.  If we put
> the more complex check-it's-all-ok logic everywhere then we have to
> keep it up to date everywhere.

This makes me think that there is in fact already a system like this:
the thing that copies our public key from tag2upload-manager-01 to
ftp-master.

Seems like you could talk to DSA about just reusing that to copy the key
around for our own purposes.

-- 
Sean Whitton

Reply via email to