A "reboot" should never fail. That is, it should always guarantee that the 
system actually went down entirely. It does not need to guarantee that it comes 
back up automatically. If it gets stuck in the boot-up process, you can just 
manually intervene and fix that whenever it's possible and when it eventually 
comes back up, everything should be golden.

Now, if you for some reason cannot reach your remote reboot device to force the 
reboot, or if that device fails to reboot it, the node issuing the reboot 
should alert the rest of the cluster it could not reboot the device and one of 
the other nodes in the cluster should make the attempt. If everything fails and 
that node stays running, then you could indeed end up with very bad things 
happening. However, this should not be any kind of common occurrence, and if 
you hardware is broken so that things don't work and you don't replace the 
hardware, that's not something an HA system can account for.


-----Original Message-----
From: Dan Urist [mailto:dur...@ucar.edu]
Sent: Friday, June 19, 2009 4:15 PM
To: pacemaker@oss.clusterlabs.org
Subject: [Pacemaker] stonith reboot behavior

My apologies if this is documented somewhere-- I've looked and haven't
found it.

What happens if a stonith reboot fails? Does it retry, and if so how
many times and with what timeout and is that configureable?

I have some hardware that has a buggy raid card that occasionally can't
find its boot disk, but works fine after a reset.
--
Dan Urist
dur...@ucar.edu
303-497-2459 (office)
303-961-2675 (cell)

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should 
be destroyed and/or returned if you are not the intended and proper recipient.

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to