Hi, On Thu, Sep 29, 2011 at 10:47:33AM -0400, Nick Khamis wrote: > Hello Dejan, > > Sorry to hijack, I am also working on the same type of setup as a prototype. > What is the best way to get stonith included for VM setups? Maybe an > SSH stonith?
external/libvirt, though somebody said that that won't do for vmware. external/vcenter for vmware. Or external/vmware, though people keep complaining that that doesn't work. I haven't used it myself. Thanks, Dejan > Again, this is just for the prototype. > > Cheers, > > Nick. > > On Thu, Sep 29, 2011 at 9:28 AM, Dejan Muhamedagic <deja...@fastmail.fm> > wrote: > > Hi Darren, > > > > On Thu, Sep 29, 2011 at 02:15:34PM +0100, darren.mans...@opengi.co.uk wrote: > >> (Originally sent to DRBD-user, reposted here as it may be more relevant) > >> > >> > >> > >> > >> Hello all. > >> > >> > >> > >> I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 > >> for dual-primary shared FS. > >> > >> > >> > >> I've followed the instructions on the DRBD applications site and it > >> works really well. > >> > >> > >> > >> However, if I 'pull the plug' on a node, the other node continues to > >> operate the clones, but the filesystem is locked and inaccessible (the > >> monitor op works for the filesystem, but fails for the OCFS2 resource.) > >> > >> > >> > >> If I do a reboot one node, there are no problems and I can continue to > >> access the OCFS2 FS. > >> > >> > >> > >> After I pull the plug: > >> > >> > >> > >> Online: [ test-odp-02 ] > >> > >> OFFLINE: [ test-odp-01 ] > >> > >> > >> > >> Resource Group: Load-Balancing > >> > >> Virtual-IP-ODP (ocf::heartbeat:IPaddr2): Started > >> test-odp-02 > >> > >> Virtual-IP-ODPWS (ocf::heartbeat:IPaddr2): Started > >> test-odp-02 > >> > >> ldirectord (ocf::heartbeat:ldirectord): Started test-odp-02 > >> > >> Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2] > >> > >> Masters: [ test-odp-02 ] > >> > >> Stopped: [ p_drbd_ocfs2:1 ] > >> > >> Clone Set: cl-odp [odp] > >> > >> Started: [ test-odp-02 ] > >> > >> Stopped: [ odp:1 ] > >> > >> Clone Set: cl-odpws [odpws] > >> > >> Started: [ test-odp-02 ] > >> > >> Stopped: [ odpws:1 ] > >> > >> Clone Set: cl_fs_ocfs2 [p_fs_ocfs2] > >> > >> Started: [ test-odp-02 ] > >> > >> Stopped: [ p_fs_ocfs2:1 ] > >> > >> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt] > >> > >> Started: [ test-odp-02 ] > >> > >> Stopped: [ g_ocfs2mgmt:1 ] > >> > >> > >> > >> Failed actions: > >> > >> p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2, > >> status=Timed Out): unknown > >> > >> exec error > >> > >> > >> > >> > >> > >> test-odp-02:~ # mount > >> > >> /dev/drbd0 on /opt/odp type ocfs2 > >> (rw,_netdev,noatime,cluster_stack=pcmk) > >> > >> > >> > >> test-odp-02:~ # ls /opt/odp > >> > >> ...just hangs forever... > >> > >> > >> > >> If I then power test-odp-01 back on, everything fails back fine and the > >> ls command suddenly completes. > >> > >> > >> > >> It seems to me that OCFS2 is trying to talk to the node that has > >> disappeared and doesn't time out. Does anyone have any ideas? (attached > >> CRM and DRBD configs) > > > > With stonith disabled, I doubt that your cluster can behave as > > it should. > > > > Thanks, > > > > Dejan > > > >> > >> > >> Many thanks. > >> > >> > >> > >> Darren Mansell > >> > >> > >> > > > > > > Content-Description: crm.txt > >> node test-odp-01 > >> node test-odp-02 \ > >> attributes standby="off" > >> primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \ > >> params lvs_support="true" ip="2.21.15.100" cidr_netmask="8" > >> broadcast="2.255.255.255" \ > >> op monitor interval="1m" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \ > >> params lvs_support="true" ip="2.21.15.103" cidr_netmask="8" > >> broadcast="2.255.255.255" \ > >> op monitor interval="1m" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive ldirectord ocf:heartbeat:ldirectord \ > >> params configfile="/etc/ha.d/ldirectord.cf" \ > >> op monitor interval="2m" timeout="20s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive odp lsb:odp \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive odpwebservice lsb:odpws \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive p_controld ocf:pacemaker:controld \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive p_drbd_ocfs2 ocf:linbit:drbd \ > >> params drbd_resource="r0" \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \ > >> params device="/dev/drbd/by-res/r0" directory="/opt/odp" > >> fstype="ocfs2" options="rw,noatime" \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> primitive p_o2cb ocf:ocfs2:o2cb \ > >> op monitor interval="10s" enabled="true" timeout="10s" \ > >> meta migration-threshold="10" failure-timeout="600" > >> group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord > >> group g_ocfs2mgmt p_controld p_o2cb > >> ms ms_drbd_ocfs2 p_drbd_ocfs2 \ > >> meta master-max="2" clone-max="2" notify="true" > >> clone cl-odp odp > >> clone cl-odpws odpws > >> clone cl_fs_ocfs2 p_fs_ocfs2 \ > >> meta target-role="Started" > >> clone cl_ocfs2mgmt g_ocfs2mgmt \ > >> meta interleave="true" > >> location Prefer-Node1 ldirectord \ > >> rule $id="prefer-node1-rule" 100: #uname eq test-odp-01 > >> order o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start > >> cl_fs_ocfs2:start > >> order tomcatlast1 inf: cl_fs_ocfs2 cl-odp > >> order tomcatlast2 inf: cl_fs_ocfs2 cl-odpws > >> property $id="cib-bootstrap-options" \ > >> dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \ > >> cluster-infrastructure="openais" \ > >> expected-quorum-votes="2" \ > >> no-quorum-policy="ignore" \ > >> start-failure-is-fatal="false" \ > >> stonith-action="reboot" \ > >> stonith-enabled="false" \ > >> last-lrm-refresh="1317207361" > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: > >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker