On 2012-06-21T08:02:25, Ulrich Windl <[email protected]> wrote:
> > See, it's simple. Any "partially" completed operation or state -> not
> > successful, ergo failure must be reported.
> Is it correct that the standard recovery procedure for this failure is node
> fencing then? If so it makes things worse IMHO.
The standard recovery for most failures is to stop the resource.
The stop is tasked with cleaning up the resource's state properly. ONLY
when stop reports that it cannot will the recovery escalate to node
fencing - since without the resource being cleaned up on node X, it
can't be started again anywhere.
So the trick is to properly implement "stop" so that it doesn't
fail, or at least: extremely rarely.
If the resource is not important enough for this - i.e., you don't want
to incovenience other resources when it fails to clean up, and leave it
to an admin to fix at some later time - set on-fail="block" for stop.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems