On 2008-08-05T18:29:16, Michael Alger <[EMAIL PROTECTED]> wrote:

> It's a bit crude, but it works. The main problem is that if the
> monitoring script stops, heartbeat has no idea and therefore can't
> factor that into its decision-making process. I haven't determined
> whether we can somehow store a "last updated" timestamp in the CIB
> and make heartbeat pretend the score is 0 if it's not reasonably
> recent; mostly because we haven't had a problem with the script
> stopping unexpectedly.

A monitor script to protect against the monitor failing. I see your
point, but when we start assuming internal errors in the cluster
software itself, it realllly quickly becomes reallllly expensive and
complex to protect against them.

We try to detect some (ie, whenever a transition happens, the existence
of the required monitor ops is checked), but when we start assuming that
the LRM pretends that a monitor op is running but never runs it, we've
gone quite far into paranoia land. ;-)


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to