Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

Nick Khamis Thu, 29 Sep 2011 08:18:11 -0700

Darren,

Please keep us updated on your progress, I am still in the stage of
setting up services and primitives. This
will all be done by the end of this week.


Cheers,

Nick.

On Thu, Sep 29, 2011 at 11:06 AM,  <darren.mans...@opengi.co.uk> wrote:
> Sorry for top-posting, I'm Outlook-afflicted.
>
> This is also my problem; In the full production environment there will be 
> low-level hardware fencing by means of IBM RSA/ASM but this is a VMware test 
> environment. The vmware STONITH plugin is dated and doesn't seem to work 
> correctly (I gave up quickly due to the author of the plugin stating on this 
> list that it probably won't work) and SSH STONITH seems to have been removed, 
> not that it would do much good in this circumstance.
>
> Therefore, there's no way to set up STONITH in a test environment in VMware 
> which is where I believe a lot of people architect solutions these days, so 
> there's no way to prove a solution works.
>
> I'll attempt to modify and improve the VMware STONITH agent but I'm not sure 
> how in this situation where a node has gone away and left a single remaining 
> node, but the remaining node is then failing, how STONITH could help? Is this 
> where the suicide agent comes in?
>
> Regards,
> Darren
>
> -----Original Message-----
> From: Nick Khamis [mailto:sym...@gmail.com]
> Sent: 29 September 2011 15:48
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1
>
> Hello Dejan,
>
> Sorry to hijack, I am also working on the same type of setup as a prototype.
> What is the best way to get stonith included for VM setups? Maybe an SSH 
> stonith?
> Again, this is just for the prototype.
>
> Cheers,
>
> Nick.
>
> On Thu, Sep 29, 2011 at 9:28 AM, Dejan Muhamedagic <deja...@fastmail.fm> 
> wrote:
>> Hi Darren,
>>
>> On Thu, Sep 29, 2011 at 02:15:34PM +0100, darren.mans...@opengi.co.uk wrote:
>>> (Originally sent to DRBD-user, reposted here as it may be more
>>> relevant)
>>>
>>>
>>>
>>>
>>> Hello all.
>>>
>>>
>>>
>>> I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2
>>> for dual-primary shared FS.
>>>
>>>
>>>
>>> I've followed the instructions on the DRBD applications site and it
>>> works really well.
>>>
>>>
>>>
>>> However, if I 'pull the plug' on a node, the other node continues to
>>> operate the clones, but the filesystem is locked and inaccessible
>>> (the monitor op works for the filesystem, but fails for the OCFS2
>>> resource.)
>>>
>>>
>>>
>>> If I do a reboot one node, there are no problems and I can continue
>>> to access the OCFS2 FS.
>>>
>>>
>>>
>>> After I pull the plug:
>>>
>>>
>>>
>>> Online: [ test-odp-02 ]
>>>
>>> OFFLINE: [ test-odp-01 ]
>>>
>>>
>>>
>>> Resource Group: Load-Balancing
>>>
>>>      Virtual-IP-ODP     (ocf::heartbeat:IPaddr2):       Started
>>> test-odp-02
>>>
>>>      Virtual-IP-ODPWS   (ocf::heartbeat:IPaddr2):       Started
>>> test-odp-02
>>>
>>>      ldirectord (ocf::heartbeat:ldirectord):    Started test-odp-02
>>>
>>> Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2]
>>>
>>>      Masters: [ test-odp-02 ]
>>>
>>>      Stopped: [ p_drbd_ocfs2:1 ]
>>>
>>> Clone Set: cl-odp [odp]
>>>
>>>      Started: [ test-odp-02 ]
>>>
>>>      Stopped: [ odp:1 ]
>>>
>>> Clone Set: cl-odpws [odpws]
>>>
>>>      Started: [ test-odp-02 ]
>>>
>>>      Stopped: [ odpws:1 ]
>>>
>>> Clone Set: cl_fs_ocfs2 [p_fs_ocfs2]
>>>
>>>      Started: [ test-odp-02 ]
>>>
>>>      Stopped: [ p_fs_ocfs2:1 ]
>>>
>>> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
>>>
>>>      Started: [ test-odp-02 ]
>>>
>>>      Stopped: [ g_ocfs2mgmt:1 ]
>>>
>>>
>>>
>>> Failed actions:
>>>
>>>     p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2,
>>> status=Timed Out): unknown
>>>
>>> exec error
>>>
>>>
>>>
>>>
>>>
>>> test-odp-02:~ # mount
>>>
>>> /dev/drbd0 on /opt/odp type ocfs2
>>> (rw,_netdev,noatime,cluster_stack=pcmk)
>>>
>>>
>>>
>>> test-odp-02:~ # ls /opt/odp
>>>
>>> ...just hangs forever...
>>>
>>>
>>>
>>> If I then power test-odp-01 back on, everything fails back fine and
>>> the ls command suddenly completes.
>>>
>>>
>>>
>>> It seems to me that OCFS2 is trying to talk to the node that has
>>> disappeared and doesn't time out. Does anyone have any ideas?
>>> (attached CRM and DRBD configs)
>>
>> With stonith disabled, I doubt that your cluster can behave as it
>> should.
>>
>> Thanks,
>>
>> Dejan
>>
>>>
>>>
>>> Many thanks.
>>>
>>>
>>>
>>> Darren Mansell
>>>
>>>
>>>
>>
>>
>> Content-Description: crm.txt
>>> node test-odp-01
>>> node test-odp-02 \
>>>         attributes standby="off"
>>> primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \
>>>         params lvs_support="true" ip="2.21.15.100" cidr_netmask="8"
>>> broadcast="2.255.255.255" \
>>>         op monitor interval="1m" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \
>>>         params lvs_support="true" ip="2.21.15.103" cidr_netmask="8"
>>> broadcast="2.255.255.255" \
>>>         op monitor interval="1m" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive ldirectord ocf:heartbeat:ldirectord \
>>>         params configfile="/etc/ha.d/ldirectord.cf" \
>>>         op monitor interval="2m" timeout="20s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive odp lsb:odp \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive odpwebservice lsb:odpws \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive p_controld ocf:pacemaker:controld \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive p_drbd_ocfs2 ocf:linbit:drbd \
>>>         params drbd_resource="r0" \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \
>>>         params device="/dev/drbd/by-res/r0" directory="/opt/odp"
>>> fstype="ocfs2" options="rw,noatime" \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> primitive p_o2cb ocf:ocfs2:o2cb \
>>>         op monitor interval="10s" enabled="true" timeout="10s" \
>>>         meta migration-threshold="10" failure-timeout="600"
>>> group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord group
>>> g_ocfs2mgmt p_controld p_o2cb ms ms_drbd_ocfs2 p_drbd_ocfs2 \
>>>         meta master-max="2" clone-max="2" notify="true"
>>> clone cl-odp odp
>>> clone cl-odpws odpws
>>> clone cl_fs_ocfs2 p_fs_ocfs2 \
>>>         meta target-role="Started"
>>> clone cl_ocfs2mgmt g_ocfs2mgmt \
>>>         meta interleave="true"
>>> location Prefer-Node1 ldirectord \
>>>         rule $id="prefer-node1-rule" 100: #uname eq test-odp-01 order
>>> o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start
>>> cl_fs_ocfs2:start order tomcatlast1 inf: cl_fs_ocfs2 cl-odp order
>>> tomcatlast2 inf: cl_fs_ocfs2 cl-odpws property
>>> $id="cib-bootstrap-options" \
>>>         dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \
>>>         cluster-infrastructure="openais" \
>>>         expected-quorum-votes="2" \
>>>         no-quorum-policy="ignore" \
>>>         start-failure-is-fatal="false" \
>>>         stonith-action="reboot" \
>>>         stonith-enabled="false" \
>>>         last-lrm-refresh="1317207361"
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacem
>>> aker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema
>> ker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

Reply via email to