> > The 3rd node should (and needs to be) fenced at this point to allow the > cluster to continue. > Is this not happening?
The fencing operation appears to complete successfully, here is the sequence: [1] All 3 nodes running properly [2] On node 3 I run "echo c > /proc/sysrq-trigger" which "hangs" node3 [3] The fence_test03 resources executes a fence operation on node 3 (fires a shutdown/startup on the vm) [4] dlm shows kern_stop state while node 3 is being fenced [5] node 3 reboots, and node 1 & 2 operate as normal (clvmd and gfs2 work properly, dlm notified that fence successful (2 members in each lock group)) [6] While node 3 is booting, cman starts properly then clvmd starts but hangs on boot [7] While node 3 is "hung" at the clvmd stage, node 1 & 2 are unable to perform lvm operations due to node 3 attempting to join the clvmd "group". Dlm shows that node 3 is a member, cman sees node 3 as a cluster member, however, pacemaker has not started as clvmd is not successfully started. Because pacemaker is not "up" and because I do not have clvmd as a resource definition, there is no fence performed if/when clvmd fails. Other than the above, fencing appears to be working properly. Are there some other fencing tests you may like me to perform to verify that fencing is working as expected? > > Did you specify on-fail=fence for the clvmd agent? > Hmmm, I don't have any clvmd agents defined within pacemaker at the moment as I am starting clvmd outside of pacemaker control. In my original post I had clvmd and dlm defined as a clone resource under pacemaker control. My understanding from the responses to that post was to remove those resources from pacemaker control and run clvmd on boot and dlm would be managed by cman startup. Are you saying that I should have dlm/clvmd defined as pacemaker resources and still have clvmd start on bootup? For example, originally I defined dlm/clvmd under pacemaker control as follows: pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true pcs resource create clvmd lsb:clvmd op monitor interval=30s on-fail=fence clone interleave=true ordered=true However, right now, the above two resource definitions have been removed from pacemaker. Thanks for your time (and others too) thus far in assisting me with this issue. Thanks _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org