Hi Stefan, On Tue, Oct 01, 2013 at 09:26:14AM +0200, Stefan Botter wrote: > Hi James, > > On Mon, 30 Sep 2013 12:31:52 -0700 > James Oakley <jf...@funktronics.ca> wrote: > > > I am having some trouble with DRBD Master/Slave resources in a 3-node > > cluster. > > > > I am using the Pacemaker packages from ha-clustering:Stable on > > openSUSE 12.3. I was going to try the packages from Unstable to see > > if they work better, but it seems the openais package is missing > > there. > > I have a quite similar setup, currently running on stock 12.2. I have a test > system just updated to 12.3, with the ha-clustering:Stable, and it fails > with STONITH enabled almost instantly, due to certain segfaults in the > stonith resources.
What exactly segfaults? Is it related to a particular stonith agent? Did you open a bugzilla for that? > With 12.2 it works flawless, and with 12.3 and the Stable repo, but > without STONITH, also. That's a critical issue which needs to be fixed. Thanks, Dejan > > So I have 3 nodes, called arthur, jonas, and rusty. The jonas and > > rusty nodes have 4 DRBD master/slave resources, which are used to > > back a series of filesystems, while the arthur node is included > > mainly to avoid split-brain, but I intend to run some resources on it > > as well, and possibly add some more nodes. > : > > Is there anything obvious I am missing? > > I don't know, but my configuration is - as said - almost similar, but a > _lot_ shorter, due to usage of groups and thus far less contraints and > location definitions. My nodes are virtual machines in VMware, thus the > vcenter stonith resources. The nodes, hermes1 and hermes 2 have the > drbd resources, hermes1 being the preferred node, and hermes3 is > there for quorum (and logs): > > ===== > node hermes1 > node hermes2 > node hermes3 > primitive apache2 lsb:apache2 \ > meta failure-timeout="90" \ > operations $id="apache2-operations" \ > op monitor interval="15" timeout="15" > primitive drbdr0 ocf:linbit:drbd \ > params drbd_resource="r0" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="100" \ > op monitor interval="30" > primitive drbdr1 ocf:linbit:drbd \ > params drbd_resource="r1" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="100" \ > op monitor interval="30" \ > meta target-role="Started" > primitive firewall_rules lsb:firewall_rules \ > meta failure-timeout="90" \ > operations $id="firewall_rules-operations" \ > op monitor interval="60" timeout="60" > primitive fs_0 ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r0" directory="/conf" fstype="ext4" > options="defaults" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="60" timeout="40" depth="0" \ > meta target-role="Started" > primitive fs_1 ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r1" directory="/var/spool/postfix" > fstype="ext4" options="defaults" \ > op start interval="0" timeout="60" \ op stop interval="0" > timeout="60" \ > op monitor interval="60" timeout="40" depth="0" \ > meta target-role="Started" > primitive getrecipientaccess lsb:getrecipientaccess \ > meta failure-timeout="90" \ > operations $id="getrecipientaccess-operations" \ > op monitor interval="15" timeout="15" > primitive mailgraph lsb:mailgraph \ > meta failure-timeout="90" \ > operations $id="mailgraph-operations" \ > op monitor interval="15" timeout="15" > primitive policyd-weight lsb:policyd-weight \ > meta failure-timeout="90" \ > operations $id="policyd-weight-operations" \ > op monitor interval="15" timeout="15" > primitive postfix lsb:postfix \ > meta failure-timeout="90" \ > operations $id="postfix-operations" \ > op monitor interval="15" timeout="15" > primitive postgrey lsb:postgrey \ > meta failure-timeout="90" \ > operations $id="postgrey-operations" \ > op monitor interval="15" timeout="15" > primitive queuegraph lsb:queuegraph \ > meta failure-timeout="90" \ > operations $id="queuegraph-operations" \ > op monitor interval="15" timeout="15" > primitive saslauthd lsb:saslauthd \ > meta failure-timeout="90" \ > operations $id="saslauthd-operations" \ > op monitor interval="15" timeout="15" > primitive spammailgraph lsb:spammailgraph \ > meta failure-timeout="90" \ > operations $id="spammailgraph-operations" \ > op monitor interval="15" timeout="15" > primitive updateispwhitelist lsb:updateispwhitelist \ > meta failure-timeout="90" \ > operations $id="updateispwhitelist-operations" \ > op monitor interval="15" timeout="15" > primitive vfencing stonith:external/vcenter \ > params VI_SERVER="svirtctr.it.ctr.internal" \ > VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" \ > HOSTLIST="hermes1=SHERMES1;hermes2=SHERMES2;shermes3=SHERMES3" \ > RESETPOWERON="0" \ op monitor start-delay="15s" interval="3600s" > primitive vip_1 ocf:heartbeat:IPaddr2 \ params ip="10.183.75.23" nic="eth0" > iflabel="0" cidr_netmask="26" \ > op monitor interval="10" timeout="20" > group apps vip_1 firewall_rules postgrey policyd-weight saslauthd postfix > apache2 mailgraph queuegraph spammailgraph getrecipientaccess > updateispwhitelist \ > meta target-role="Started" > group fs fs_0 fs_1 group g-drbd drbdr0 drbdr1 ms ms_drbd g-drbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > clone Fencing vfencing > location l-Fencing_hermes1 Fencing 0: hermes1 > location l-Fencing_hermes2 Fencing 0: hermes2 > location l-Fencing_hermes3 Fencing 0: hermes3 > location l-apache2-hermes3 apache2 -inf: hermes3 > location l-apps-hermes1 apps 50: hermes1 > location l-apps-hermes2 apps 0: hermes2 > location l-fs-hermes1 fs 50: hermes1 > location l-fs-hermes2 fs 0: hermes2 > location l-mailgraph-hermes3 mailgraph -inf: hermes3 > location l-ms_drbd_hermes1 ms_drbd 50: hermes1 > location l-ms_drbd_hermes2 ms_drbd 0: hermes2 > location l-postfix-hermes3 postfix -inf: hermes3 > location l-queuegraph-hermes3 queuegraph -inf: hermes3 > location l-spammailgraph-hermes3 spammailgraph -inf: hermes3 > colocation cl-apps_on_fs inf: fs:Started apps:Started > colocation cl-fs_on_drbd_r0 inf: ms_drbd:Master fs:Started > order o-apps_after_fs inf: fs:start apps:start > order o-fs_after_drbd inf: ms_drbd:promote fs:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="3" \ > symmetric-cluster="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1380199456" \ > stonith-action="poweroff" > ===== > > Note, that there are only two colocation and two order statements, and I > believe, that I could get rid of some of the location statements, too. > > As said, this setup currently runs on openSUSE 12.2 > I know, 13.1 is near, but I fear the status of the ha-clustering in 13.1 will > not be that great, so maybe you give it a try with a 12.2 installation first. > > Greetings, > > Stefan > -- > Stefan Botter zu Hause > Bremen > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org