Am 19.08.2013 16:10, schrieb Jake Smith:
-----Original Message-----
From: Elmar Marschke [mailto:elmar.marsc...@schenker.at]
Sent: Friday, August 16, 2013 10:31 PM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Dual primary drbd + ocfs2: problems starting
o2cb
Am 16.08.2013 15:46, schrieb Jake Smith:
-----Original Message-----
From: Elmar Marschke [mailto:elmar.marsc...@schenker.at]
Sent: Friday, August 16, 2013 9:05 AM
To: The Pacemaker cluster resource manager
Subject: [Pacemaker] Dual primary drbd + ocfs2: problems starting
o2cb
Hi all,
i'm working on a two node pacemaker cluster with dual primary drbd
and ocfs2.
Dual pri drbd and ocfs2 WITHOUT pacemaker work fine (mounting,
reading, writing, everything...).
When i try to make this work in pacemaker, there seems to be a
problem
to
start the o2cb resource.
My (already simplified) configuration is:
-----------------------------------------
node poc1 \
attributes standby="off"
node poc2 \
attributes standby="off"
primitive res_dlm ocf:pacemaker:controld \
op monitor interval="120"
primitive res_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op stop interval="0" timeout="100" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notifiy interval="0" timeout="90" \
op monitor interval="40" role="Slave" timeout="20" \
op monitor interval="20" role="Master" timeout="20"
primitive res_o2cb ocf:pacemaker:o2cb \
op monitor interval="60"
ms ms_drbd res_drbd \
meta notify="true" master-max="2" master-node-max="1" target-
role="Started"
property $id="cib-bootstrap-options" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
last-lrm-refresh="1376574860"
Looks like you are missing ordering and colocation and clone (even
group to make it a shorter config; group = order and colocation in one
statement) statements. The resources *must* start in a particular
order and they much run on the same node and there must be an instance
of each resource on each node.
More here for DRBD 8.4:
http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html
Or DRBD 8.3:
http://www.drbd.org/users-guide-8.3/s-ocfs2-pacemaker.html
Basically add:
Group grp_dlm_o2cb res_dlm res_o2cb
Clone cl_dlm_o2cb grp_dlm_o2cb meta interleave=true Order
ord_drbd_then_dlm_o2cb res_drbd:promote cl_dlm_o2cb:start
Colocation
col_dlm_o2cb_with_drbdmaster cl_dlm_o2cb res_drbd:Master
HTH
Jake
Hello Jake,
thanks for your reply. I already had res_dlm and res_o2cb grouped
together
and cloned like in your advice; indeed this was my initial
configuration. But
the problem showed up, so i tried to simplify the configuration to
reduce
possible error sources.
But now it seems i found a solution; or at least a workaround: i just
use the
LSB resource agent lsb:o2cb. This one works! The resource starts without
a
problem on both nodes and as far as i can see right now everything is
fine
(tried with and without additional group and clone resource).
Don't know if this will bring some drawbacks in the future; but for the
moment my problem seems to be solved.
Not sure either - usually resource agents are more robust than simple LSB.
I would also verify that the o2cb LSB is fully LSB compliant or your
cluster will have issues
Currently it seems to me that there's a subtle problem with the
ocf:pacemaker:o2cb resource agent; at least on my system.
Maybe, maybe not - if you take a look at the o2cb resource agent the error
message you were getting is after trying to start
/usr/sbin/ocfs2_controld.pcmk for 10 seconds without success... I would
time starting o2cb. Might be as simple as allowing more time for startup
of the daemon.
I've not setup ocfs2 in a while but I believe you may be able to extend
that timeout in the meta of the primitive without having to muck with the
actual resource agent.
Jake
Hello Jake,
yes, i think thats possible as you wrote. This is what ra meta
ocf:pacemaker:o2cb says:
daemon_timeout (string, [10]): Daemon Timeout
Number of seconds to allow the control daemon to come up
Thanks for the hint. I'll check that out when possible and see if it
changes the behaviour. Currently i'm fine with lsb:o2cb...
regards
Anyway, thanks a lot for your answer..!
Best regards
elmar
First error message in corosync.log as far as i can identify it:
----------------------------------------------------------------
lrmd: [5547]: info: RA output: (res_dlm:probe:stderr)
dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_dlm:start:stderr)
dlm_controld.pcmk:
no process found
[ other stuff ]
lrmd: [5547]: info: RA output: (res_o2cb:start:stderr)
2013/08/16_13:25:20 ERROR: ocfs2_controld.pcmk did not come up
(
You can find the whole corosync logfile (starting corosync on node 1
from
beginning until after starting of resources) on:
http://www.marschke.info/corosync_drei.log
)
syslog shows:
-------------
ocfs2_controld.pcmk[5774]: Unable to connect to CKPT: Object does not
exist
Output of crm_mon:
------------------
============
Stack: openais
Current DC: poc1 - partition WITHOUT quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ poc1 ]
OFFLINE: [ poc2 ]
Master/Slave Set: ms_drbd [res_drbd]
Masters: [ poc1 ]
Stopped: [ res_drbd:1 ]
res_dlm (ocf::pacemaker:controld): Started poc1
Migration summary:
* Node poc1:
res_o2cb: migration-threshold=1000000 fail-count=1000000
Failed actions:
res_o2cb_start_0 (node=poc1, call=6, rc=1, status=complete):
unknown error
---------------------------------------------------------------------
This is the situation after a reboot of node poc1. For simplification
i
left
pacemaker / corosync unstarted on the second node, and already
removed a group and a clone resource where dlm and o2cb already had
been in
(errors
were there also).
Is my configuration of the resource agents correct?
I checked using "ra meta ...", but as far as i recognized everything
is
ok.
Is some piece of software missing?
dlm-pcmk is installed, ocfs2_controld.pcmk and dlm_controld.pcmk are
available, i even did additional links in /usr/sbin:
root@poc1:~# which ocfs2_controld.pcmk /usr/sbin/ocfs2_controld.pcmk
root@poc1:~# which dlm_controld.pcmk /usr/sbin/dlm_controld.pcmk
root@poc1:~#
I already googled but couldn't find any useful. Thanks for any
hints...:)
kind regards
elmar
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org