A "reboot" should never fail. That is, it should always guarantee that the system actually went down entirely. It does not need to guarantee that it comes back up automatically. If it gets stuck in the boot-up process, you can just manually intervene and fix that whenever it's possible and when it eventually comes back up, everything should be golden.
Now, if you for some reason cannot reach your remote reboot device to force the reboot, or if that device fails to reboot it, the node issuing the reboot should alert the rest of the cluster it could not reboot the device and one of the other nodes in the cluster should make the attempt. If everything fails and that node stays running, then you could indeed end up with very bad things happening. However, this should not be any kind of common occurrence, and if you hardware is broken so that things don't work and you don't replace the hardware, that's not something an HA system can account for. -----Original Message----- From: Dan Urist [mailto:dur...@ucar.edu] Sent: Friday, June 19, 2009 4:15 PM To: pacemaker@oss.clusterlabs.org Subject: [Pacemaker] stonith reboot behavior My apologies if this is documented somewhere-- I've looked and haven't found it. What happens if a stonith reboot fails? Does it retry, and if so how many times and with what timeout and is that configureable? I have some hardware that has a buggy raid card that occasionally can't find its boot disk, but works fine after a reset. -- Dan Urist dur...@ucar.edu 303-497-2459 (office) 303-961-2675 (cell) _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker CONFIDENTIAL. This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient. _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker