On Tue, May 05, 2009 at 11:38:36PM +0000, David Holland wrote: > Having things fail silently or go into a fugue state is not an > improvement, particularly in security code. So I'd qualify all this by > saying that end-to-end behavior should always be fail-stop.
Since this is apparently not that clear: If an object (piece of hardware, daemon process, whatever) fails, it can either stop (exit, shut off, etc.) or sit there catatonically. Or maybe continue to operate incorrectly. Such objects are, one way or another, part of larger systems. If the system has recovery or restart logic, stopping on failure allows that logic to go into action. If the system does not have such logic, there won't be further service without manual intervention regardless of whether the object stops. For this reason objects should nearly always stop on critical failures. And, in cases where availability matters, objects whose stoppage would result in problems should be provided with restart logic. I thought this was a general principle of reliable/HA system design, but apparently it's not as widely understood/believed/recognized as I thought. Or something. :-| -- David A. Holland dholl...@netbsd.org