Re: [Pacemaker] Dual primary drbd resouce not promoted on one host

Jürgen Herrmann Tue, 05 Feb 2013 13:07:44 -0800

Am 05.02.2013 16:32, schrieb Jake Smith:

----- Original Message -----

From: "Jürgen Herrmann" <juergen.herrm...@xlhost.de>
To: pacemaker@oss.clusterlabs.org
Sent: Tuesday, February 5, 2013 7:04:26 AM

Subject: [Pacemaker] Dual primary drbd resouce not promoted on onehost


Hi there!

I have the following problem:

I have a 2 node cluster with a dual primary drbd resource. On top
of it sits an OCFS2 file system. nodes: app1a, app1b

Now today I had the following scenario (occurred several times now):
- crm node standby app1a
- poweroff app1a for hdd replacement (hw raid controller)
- poweron app1a
- crm node online app1a

all the other resources come back up as expecte, expect the master
slave set for the dual primary drbd.

here's the relevant portion of my cluster config:

node app1a.xlhost.de \
         attributes standby="off"
node app1b.xlhost.de \
         attributes standby="off"
primitive resDLM ocf:pacemaker:controld \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="120s"
primitive resDRBD0 ocf:linbit:drbd \
         op monitor interval="23" role="Slave" timeout="30" \
         op monitor interval="13" role="Master" timeout="20" \
         op start interval="0" timeout="240s" \
         op promote interval="0" timeout="240s" \
         op demote interval="0" timeout="100s" \
         op stop interval="0" timeout="100s" \
         params drbd_resource="drbd0"
primitive resFSDRBD0 ocf:heartbeat:Filesystem \
         params device="/dev/drbd0" directory="/mnt/drbd0"
fstype="ocfs2" options="noatime,intr,nodiratime,heartbeat=none" \
         op monitor interval="120s" timeout="50s" \
         op start interval="0" timeout="70s" \
         op stop interval="0" timeout="70s"
primitive resO2CB ocf:pacemaker:o2cb \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="120s"
ms msDRBD0 resDRBD0 \
         meta master-max="2" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Master"
clone cloneDLM resDLM \
         meta globally-unique="false" interleave="true"
target-role="Started"
clone cloneFSDRBD0 resFSDRBD0 \
         meta interleave="true" globally-unique="false"
target-role="Started"
clone cloneO2CB resO2CB \
         meta globally-unique="false" interleave="true"
target-role="Started"
colocation colFSDRBD0_DRBD0 inf: cloneFSDRBD0 msDRBD0:Master


^^^ This colocation should be cloneDLM on msDRBD0.

colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB
colocation colO2CB_DLM inf: cloneO2CB cloneDLM
order ordDLM_FSDRBD0 inf: cloneDLM cloneFSDRBD0


^^^ This order statement is not needed.

order ordDLM_O2CB inf: cloneDLM cloneO2CB
order ordDRBD0_FSDRBD0 inf: msDRBD0:promote cloneFSDRBD0


^^^ This order should be msDRBD0:promote then cloneDLM:start

If you explicitly define the action in an order statement for the
resource then the same action is implied for the rest of the

resources. So your statement is going to try to promotecloneFSDRBD0.

You should define both actions explicitly like this:

order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start

order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0

if i take down both nodes and fire them up again, everything goes
back
to normal and msDRBD0 is promoted to master on both nodes.

I suspect this has something to do with ordering or colocation
constraints

but i'm not sure though. i've been staring at this problem fordozens

of
times now and a vast amount of googling did not turn up my specific
problem either.


I'm pretty sure you are correct.  I haven't used/tested OCFS on
Pacemaker in awhile but I believe this is the correct
ordering/collocation you're looking for (same as my notes above):

Order - DRBD:promote then DLM:start then O2CB:start then FS:start
Collocation - FS on O2CB on DLM on DRBD:master


Hi Jake!

Thanks very much for your comments!

To sum it up i rewrote all six order/colo statements here:

colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
colocation colO2CB_DLM inf: cloneO2CB cloneDLM
colocation colFSDRBD0_O2CB inf: cloneFSDRBD0 cloneO2CB

order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
order ordDLM_O2CB inf: cloneDLM cloneO2CB
order ordO2CB_FSDRBD0 inf: cloneO2CB cloneFSDRBD0

will try this sometime in the upcoming nights and will report back,
maybe in the meantime you could have a look at the statements again
to doublecheck? thanks in advance.

best regards,
Jürgen Herrmann

--

XLhost.de ® - Webhosting von supersmall bis eXtra Large <<


XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830
Web:  http://www.XLhost.de

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Dual primary drbd resouce not promoted on one host

Reply via email to