]] Russ Allbery > Basically, what we're looking for here is the equivalent of a check engine > light (except, of course, with better user-visible diagnostics available). > That's what the end user actually wants: something clear and visible > indicating that something is wrong, which they can drill down and see the > details and dismiss the error condition if they want, or have all the > details available to consult someone who knows more about computers if > they don't know what to do with it themselves. Historically, root cron > mail has been exactly that, and that's still a great way of handling it > for servers, since that mail can be sent off somewhere centrally, analyzed > and assigned to sysadmins, used to open internal trouble tickets, etc.
I don't think it's a good way at all, since far too often, cron mails aren't actionable. I'll get a mail from some automated process that tried to run apt-get update and that failed (during the middle of the night). Since that process runs every hour, it'll have succeeded afterwards, and there's nothing I can do about the mail. I wish we had a better system where some, but not all errors would latch and need acknowledgment, there would be correlation (between hosts and between messages, so if the router's down, you get a message about data centre A not being able to successfully complete $process, rather than a zillion individual messages), there would be merging of identical messages, so I get a message about $process being broken for the last $time period (or having a failure rate above $threshold), rather than a thousand mails because of some error. Oh, and a pony. Don't forget the pony. Or an otter, I like otters. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/m238t2maxt....@rahvafeir.err.no