Darren, Please keep us updated on your progress, I am still in the stage of setting up services and primitives. This will all be done by the end of this week.
Cheers, Nick. On Thu, Sep 29, 2011 at 11:06 AM, <darren.mans...@opengi.co.uk> wrote: > Sorry for top-posting, I'm Outlook-afflicted. > > This is also my problem; In the full production environment there will be > low-level hardware fencing by means of IBM RSA/ASM but this is a VMware test > environment. The vmware STONITH plugin is dated and doesn't seem to work > correctly (I gave up quickly due to the author of the plugin stating on this > list that it probably won't work) and SSH STONITH seems to have been removed, > not that it would do much good in this circumstance. > > Therefore, there's no way to set up STONITH in a test environment in VMware > which is where I believe a lot of people architect solutions these days, so > there's no way to prove a solution works. > > I'll attempt to modify and improve the VMware STONITH agent but I'm not sure > how in this situation where a node has gone away and left a single remaining > node, but the remaining node is then failing, how STONITH could help? Is this > where the suicide agent comes in? > > Regards, > Darren > > -----Original Message----- > From: Nick Khamis [mailto:sym...@gmail.com] > Sent: 29 September 2011 15:48 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1 > > Hello Dejan, > > Sorry to hijack, I am also working on the same type of setup as a prototype. > What is the best way to get stonith included for VM setups? Maybe an SSH > stonith? > Again, this is just for the prototype. > > Cheers, > > Nick. > > On Thu, Sep 29, 2011 at 9:28 AM, Dejan Muhamedagic <deja...@fastmail.fm> > wrote: >> Hi Darren, >> >> On Thu, Sep 29, 2011 at 02:15:34PM +0100, darren.mans...@opengi.co.uk wrote: >>> (Originally sent to DRBD-user, reposted here as it may be more >>> relevant) >>> >>> >>> >>> >>> Hello all. >>> >>> >>> >>> I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 >>> for dual-primary shared FS. >>> >>> >>> >>> I've followed the instructions on the DRBD applications site and it >>> works really well. >>> >>> >>> >>> However, if I 'pull the plug' on a node, the other node continues to >>> operate the clones, but the filesystem is locked and inaccessible >>> (the monitor op works for the filesystem, but fails for the OCFS2 >>> resource.) >>> >>> >>> >>> If I do a reboot one node, there are no problems and I can continue >>> to access the OCFS2 FS. >>> >>> >>> >>> After I pull the plug: >>> >>> >>> >>> Online: [ test-odp-02 ] >>> >>> OFFLINE: [ test-odp-01 ] >>> >>> >>> >>> Resource Group: Load-Balancing >>> >>> Virtual-IP-ODP (ocf::heartbeat:IPaddr2): Started >>> test-odp-02 >>> >>> Virtual-IP-ODPWS (ocf::heartbeat:IPaddr2): Started >>> test-odp-02 >>> >>> ldirectord (ocf::heartbeat:ldirectord): Started test-odp-02 >>> >>> Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2] >>> >>> Masters: [ test-odp-02 ] >>> >>> Stopped: [ p_drbd_ocfs2:1 ] >>> >>> Clone Set: cl-odp [odp] >>> >>> Started: [ test-odp-02 ] >>> >>> Stopped: [ odp:1 ] >>> >>> Clone Set: cl-odpws [odpws] >>> >>> Started: [ test-odp-02 ] >>> >>> Stopped: [ odpws:1 ] >>> >>> Clone Set: cl_fs_ocfs2 [p_fs_ocfs2] >>> >>> Started: [ test-odp-02 ] >>> >>> Stopped: [ p_fs_ocfs2:1 ] >>> >>> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt] >>> >>> Started: [ test-odp-02 ] >>> >>> Stopped: [ g_ocfs2mgmt:1 ] >>> >>> >>> >>> Failed actions: >>> >>> p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2, >>> status=Timed Out): unknown >>> >>> exec error >>> >>> >>> >>> >>> >>> test-odp-02:~ # mount >>> >>> /dev/drbd0 on /opt/odp type ocfs2 >>> (rw,_netdev,noatime,cluster_stack=pcmk) >>> >>> >>> >>> test-odp-02:~ # ls /opt/odp >>> >>> ...just hangs forever... >>> >>> >>> >>> If I then power test-odp-01 back on, everything fails back fine and >>> the ls command suddenly completes. >>> >>> >>> >>> It seems to me that OCFS2 is trying to talk to the node that has >>> disappeared and doesn't time out. Does anyone have any ideas? >>> (attached CRM and DRBD configs) >> >> With stonith disabled, I doubt that your cluster can behave as it >> should. >> >> Thanks, >> >> Dejan >> >>> >>> >>> Many thanks. >>> >>> >>> >>> Darren Mansell >>> >>> >>> >> >> >> Content-Description: crm.txt >>> node test-odp-01 >>> node test-odp-02 \ >>> attributes standby="off" >>> primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \ >>> params lvs_support="true" ip="2.21.15.100" cidr_netmask="8" >>> broadcast="2.255.255.255" \ >>> op monitor interval="1m" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \ >>> params lvs_support="true" ip="2.21.15.103" cidr_netmask="8" >>> broadcast="2.255.255.255" \ >>> op monitor interval="1m" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive ldirectord ocf:heartbeat:ldirectord \ >>> params configfile="/etc/ha.d/ldirectord.cf" \ >>> op monitor interval="2m" timeout="20s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive odp lsb:odp \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive odpwebservice lsb:odpws \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive p_controld ocf:pacemaker:controld \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive p_drbd_ocfs2 ocf:linbit:drbd \ >>> params drbd_resource="r0" \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd/by-res/r0" directory="/opt/odp" >>> fstype="ocfs2" options="rw,noatime" \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> primitive p_o2cb ocf:ocfs2:o2cb \ >>> op monitor interval="10s" enabled="true" timeout="10s" \ >>> meta migration-threshold="10" failure-timeout="600" >>> group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord group >>> g_ocfs2mgmt p_controld p_o2cb ms ms_drbd_ocfs2 p_drbd_ocfs2 \ >>> meta master-max="2" clone-max="2" notify="true" >>> clone cl-odp odp >>> clone cl-odpws odpws >>> clone cl_fs_ocfs2 p_fs_ocfs2 \ >>> meta target-role="Started" >>> clone cl_ocfs2mgmt g_ocfs2mgmt \ >>> meta interleave="true" >>> location Prefer-Node1 ldirectord \ >>> rule $id="prefer-node1-rule" 100: #uname eq test-odp-01 order >>> o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start >>> cl_fs_ocfs2:start order tomcatlast1 inf: cl_fs_ocfs2 cl-odp order >>> tomcatlast2 inf: cl_fs_ocfs2 cl-odpws property >>> $id="cib-bootstrap-options" \ >>> dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="2" \ >>> no-quorum-policy="ignore" \ >>> start-failure-is-fatal="false" \ >>> stonith-action="reboot" \ >>> stonith-enabled="false" \ >>> last-lrm-refresh="1317207361" >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacem >>> aker >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema >> ker >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker