Hi, On Fri, Feb 26, 2010 at 08:45:48PM +0100, Sander van Vugt wrote: > Hi list, > > My daily call for help with my stonith woes. This is a problem that I've > been struggling with for a while and I just can't get rid of it. The > basic configuration SLES 11 with HAE, configured for high availability > of xen virtual machines, using clvm as the storage backend and sbd > stonith to guarantee integrity. For those that are willing to have a > look, the output of hb_report and supportconfig (with all the required > documentation) are at http://www.sandervanvugt.nl/novellsupport > > The short situation description: I'm using sbd stonith and I've got a > stonith request roaming around in the cluster (but not being executed). > It gives messages like: > > nd1 stonithd: ... tengine requests a STONITH operation RESET on node > nd2. > nd1 stonithd: info: we can't manage nd2, broadcast request to other > nodes.
Looks like sbd when retrieving the list of hosts from the device didn't find nd2. Though in that case the stonith resource shouldn't have started. On monitor, it takes a list of nodes from crm_node -l and then checks if they have slots on the device. > At the same time, all my three nodes are unclean and no STONITH is > happening at all. How comes that all are unclean? > So basically I have two questions now: > * Is there any way to get rid of the stonith action which > shouldn't be there? Normally, there's a good reason to fence a node. Why do you think the requests are wrong? > * Is there any way to get my nodes back to a clean status? Only by fencing or restarting the cluster. Alternatively, a node may be deleted. Thanks, Dejan > Thanks in advance, > Sander van Vugt > > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker