Hi, On Mon, Jul 02, 2012 at 05:49:38PM +0530, sachin garg wrote: > Hi, > > I am using IPMI plugin for configuring STONITH with heartbeat cluster. > If a resource fails on one node then the other node STONITHs that node. But > when the failed node comes back after the reboot, the STONITH device itself > fails on the node which has started again. Logs indicate that IPMI start > operation returned 1 (i.e. unknown error).
Isn't there more in the logs, i.e. a specific reason? > I suspect that this may be due > to some initialization delays at network level. But I am not sure about > this. What could be the best way to overcome this issue? I consider adding > a start delay to stonith device but can't say if that is the right > approach. Happens only once after boot? Afterwards works fine? Strange. Well, it's arguably good practice not to start the cluster stack automatically on boot. > Moreover, how should one configure start/monitor operation failure for a > STONITH device? I have currently configured pacemaker to fence the node if > start/monitor operation fails for STONITH device. Is this the right > configuration? No. Nothing special needs to be configured. > And what should be the monitoring frequency for STONITH device? Take a look here http://clusterlabs.org/doc/crm_fencing.html Thanks, Dejan > Regards > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org