Hi Andreas, ----- Original Message -----
> From: "Andreas Kurz" <andr...@hastexo.com> > To: pacemaker@oss.clusterlabs.org > Sent: Tuesday, April 10, 2012 5:28:15 AM > Subject: Re: [Pacemaker] Nodes will not promote DRBD resources to > master on failover > On 04/10/2012 06:17 AM, Andrew Martin wrote: > > Hi Andreas, > > > > Yes, I attempted to generalize hostnames and usernames/passwords in > > the > > archive. Sorry for making it more confusing :( > > > > I completely purged pacemaker from all 3 nodes and reinstalled > > everything. I then completely rebuild the CIB by manually adding in > > each > > primitive/constraint one at a time and testing along the way. After > > doing this DRBD appears to be working at least somewhat better - > > the > > ocf:linbit:drbd devices are started and managed by pacemaker. > > However, > > if for example a node is STONITHed when it comes back up it will > > not > > restart the ocf:linbit:drbd resources until I manually load the > > DRBD > > kernel module, bring the DRBD devices up (drbdadm up all), and > > cleanup > > the resources (e.g. crm resource cleanup ms_drbd_vmstore). Is it > > possible that the DRBD kernel module needs to be loaded at boot > > time, > > independent of pacemaker? > No, this is done by the drbd OCF script on start. > > > > Here's the new CIB (mostly the same as before): > > http://pastebin.com/MxrqBXMp > > > > Typically quorumnode stays in the OFFLINE (standby) state, though > > occasionally it changes to pending. I have just tried > > cleaning /var/lib/heartbeat/crm on quorumnode again so we will see > > if > > that helps keep it in the OFFLINE (standby) state. I have it > > explicitly > > set to standby in the CIB configuration and also created a rule to > > prevent some of the resources from running on it? > > node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \ > > attributes standby="on" > > ... > The node should be in "ONLINE (standby)" state if you start heartbeat > and pacemaker is enabled with "crm yes" or "crm respawn"in ha.cf I have never seen it listed as ONLINE (standby). Here's the ha.cf on quorumnode: autojoin none mcast eth0 239.0.0.43 694 1 0 warntime 5 deadtime 15 initdead 60 keepalive 2 node node1 node node2 node quorumnode crm respawn And here's the ha.cf on node[12]: autojoin none mcast br0 239.0.0.43 694 1 0 bcast br1 warntime 5 deadtime 15 initdead 60 keepalive 2 node node1 node node2 node quorumnode crm respawn respawn hacluster /usr/lib/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster The only difference between these boxes is that quorumnode is a CentOS 5.5 box so it is stuck at heartbeat 3.0.3, whereas node[12] are both on Ubuntu 10.04 using the Ubuntu HA PPA, so they are running heartbeat 3.0.5. Would this make a difference? > > location loc_not_on_quorumnode g_vm -inf: quorumnode > > > > Would it be wise to create additional constraints to prevent all > > resources (including each ms_drbd resource) from running on it, > > even > > though this should be implied by standby? > There is no need for that. A node in standby will never run resources > and if there is no DRBD and installed on that node your resources > won't > start anyways. I've removed this constraint > > > > Below is a portion of the log from when I started a node yet DRBD > > failed > > to start. As you can see it thinks the DRBD device is operating > > correctly as it proceeds to starting subsequent resources, e.g. > > Apr 9 20:22:55 node1 Filesystem[2939]: [2956]: WARNING: Couldn't > > find > > device [/dev/drbd0]. Expected /dev/??? to exist > > http://pastebin.com/zTCHPtWy > The only thing i can read from that log fragments is, that probes are > running ... not enough information. Really interesting would be logs > from the DC. Here is the log from the DC for that same time period: http://pastebin.com/d4PGGLPi > > > > After seeing these messages in the log I run > > # service drbd start > > # drbdadm up all > > # crm resource cleanup ms_drbd_vmstore > > # crm resource cleanup ms_drbd_mount1 > > # crm resource clenaup ms_drbd_mount2 > That should all not be needed ... what is the output of "crm_mon > -1frA" > before you do all that cleanups? I will get this output the next time I can put the cluster in this state. > > After this sequence of commands the DRBD resources appear to be > > functioning normally and the subsequent resources start. Any ideas > > on > > why DRBD is not being started as expected, or why the cluster is > > continuing with starting resources that according to the > > o_drbd-fs-vm > > constraint should not start until DRBD is master? > No idea, maybe creating a crm_report archive and sending it to the > list > can shed some light on that problem. > Regards, > Andreas > -- > Need help with Pacemaker? > http://www.hastexo.com/now Thanks, Andrew > > > > Thanks, > > > > Andrew > > ------------------------------------------------------------------------ > > *From: *"Andreas Kurz" <andr...@hastexo.com> > > *To: *pacemaker@oss.clusterlabs.org > > *Sent: *Monday, April 2, 2012 6:33:44 PM > > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to > > master on failover > > > > On 04/02/2012 05:47 PM, Andrew Martin wrote: > >> Hi Andreas, > >> > >> Here is the crm_report: > >> http://dl.dropbox.com/u/2177298/pcmk-Mon-02-Apr-2012.bz2 > > > > You tried to do some obfuscation on parts of that archive? ... > > doesn't > > really make it easier to debug .... > > > > Does the third node ever change its state? > > > > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending > > > > Looking at the logs and the transition graph says it aborts due to > > un-runable operations on that node which seems to be related to > > it's > > pending state. > > > > Try to get that node up (or down) completely ... maybe a fresh > > start-over with a clean /var/lib/heartbeat/crm directory is > > sufficient. > > > > Regards, > > Andreas > > > >> > >> Hi Emmanuel, > >> > >> Here is the configuration: > >> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \ > >> attributes standby="off" > >> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \ > >> attributes standby="off" > >> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \ > >> attributes standby="on" > >> primitive p_drbd_mount2 ocf:linbit:drbd \ > >> params drbd_resource="mount2" \ > >> op start interval="0" timeout="240" \ > >> op stop interval="0" timeout="100" \ > >> op monitor interval="10" role="Master" timeout="20" > >> start-delay="1m" \ > >> op monitor interval="20" role="Slave" timeout="20" > >> start-delay="1m" > >> primitive p_drbd_mount1 ocf:linbit:drbd \ > >> params drbd_resource="mount1" \ > >> op start interval="0" timeout="240" \ > >> op stop interval="0" timeout="100" \ > >> op monitor interval="10" role="Master" timeout="20" > >> start-delay="1m" \ > >> op monitor interval="20" role="Slave" timeout="20" > >> start-delay="1m" > >> primitive p_drbd_vmstore ocf:linbit:drbd \ > >> params drbd_resource="vmstore" \ > >> op start interval="0" timeout="240" \ > >> op stop interval="0" timeout="100" \ > >> op monitor interval="10" role="Master" timeout="20" > >> start-delay="1m" \ > >> op monitor interval="20" role="Slave" timeout="20" > >> start-delay="1m" > >> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ > >> params device="/dev/drbd0" directory="/mnt/storage/vmstore" > > fstype="ext4" \ > >> op start interval="0" timeout="60s" \ > >> op stop interval="0" timeout="60s" \ > >> op monitor interval="20s" timeout="40s" > >> primitive p_libvirt-bin upstart:libvirt-bin \ > >> op monitor interval="30" > >> primitive p_ping ocf:pacemaker:ping \ > >> params name="p_ping" host_list="192.168.3.1 192.168.3.2" > > multiplier="1000" \ > >> op monitor interval="20s" > >> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ > >> params email="m...@example.com" \ > >> params subject="Pacemaker Change" \ > >> op start interval="0" timeout="10" \ > >> op stop interval="0" timeout="10" \ > >> op monitor interval="10" timeout="10" > >> primitive p_vm ocf:heartbeat:VirtualDomain \ > >> params config="/mnt/storage/vmstore/config/vm.xml" \ > >> meta allow-migrate="false" \ > >> op start interval="0" timeout="180" \ > >> op stop interval="0" timeout="180" \ > >> op monitor interval="10" timeout="30" > >> primitive stonith-node1 stonith:external/tripplitepdu \ > >> params pdu_ipaddr="192.168.3.100" pdu_port="1" pdu_username="xxx" > >> pdu_password="xxx" hostname_to_stonith="node1" > >> primitive stonith-node2 stonith:external/tripplitepdu \ > >> params pdu_ipaddr="192.168.3.100" pdu_port="2" pdu_username="xxx" > >> pdu_password="xxx" hostname_to_stonith="node2" > >> group g_daemons p_libvirt-bin > >> group g_vm p_fs_vmstore p_vm > >> ms ms_drbd_mount2 p_drbd_mount2 \ > >> meta master-max="1" master-node-max="1" clone-max="2" > >> clone-node-max="1" > >> notify="true" > >> ms ms_drbd_mount1 p_drbd_mount1 \ > >> meta master-max="1" master-node-max="1" clone-max="2" > >> clone-node-max="1" > >> notify="true" > >> ms ms_drbd_vmstore p_drbd_vmstore \ > >> meta master-max="1" master-node-max="1" clone-max="2" > >> clone-node-max="1" > >> notify="true" > >> clone cl_daemons g_daemons > >> clone cl_ping p_ping \ > >> meta interleave="true" > >> clone cl_sysadmin_notify p_sysadmin_notify \ > >> meta target-role="Started" > >> location l-st-node1 stonith-node1 -inf: node1 > >> location l-st-node2 stonith-node2 -inf: node2 > >> location l_run_on_most_connected p_vm \ > >> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping > >> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master > >> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm > >> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote > >> ms_drbd_mount1:promote > >> ms_drbd_mount2:promote cl_daemons:start g_vm:start > >> property $id="cib-bootstrap-options" \ > >> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > >> cluster-infrastructure="Heartbeat" \ > >> stonith-enabled="true" \ > >> no-quorum-policy="freeze" \ > >> last-lrm-refresh="1333041002" \ > >> cluster-recheck-interval="5m" \ > >> crmd-integration-timeout="3m" \ > >> shutdown-escalation="5m" > >> > >> Thanks, > >> > >> Andrew > >> > >> > >> ------------------------------------------------------------------------ > >> *From: *"emmanuel segura" <emi2f...@gmail.com> > >> *To: *"The Pacemaker cluster resource manager" > >> <pacemaker@oss.clusterlabs.org> > >> *Sent: *Monday, April 2, 2012 9:43:20 AM > >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources > >> to > >> master on failover > >> > >> Sorry Andrew > >> > >> Can you post me your crm configure show again? > >> > >> Thanks > >> > >> Il giorno 30 marzo 2012 18:53, Andrew Martin <amar...@xes-inc.com > >> <mailto:amar...@xes-inc.com>> ha scritto: > >> > >> Hi Emmanuel, > >> > >> Thanks, that is a good idea. I updated the colocation contraint as > >> you described. After, the cluster remains in this state (with the > >> filesystem not mounted and the VM not started): > >> Online: [ node2 node1 ] > >> > >> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Master/Slave Set: ms_drbd_tools [p_drbd_mount1] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Master/Slave Set: ms_drbd_crm [p_drbd_mount2] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Clone Set: cl_daemons [g_daemons] > >> Started: [ node2 node1 ] > >> Stopped: [ g_daemons:2 ] > >> stonith-node1 (stonith:external/tripplitepdu): Started node2 > >> stonith-node2 (stonith:external/tripplitepdu): Started node1 > >> > >> I noticed that Pacemaker had not issued "drbdadm connect" for any > >> of > >> the DRBD resources on node2 > >> # service drbd status > >> drbd driver loaded OK; device status: > >> version: 8.3.7 (api:88/proto:86-91) > >> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by > >> root@node2, 2012-02-02 12:29:26 > >> m:res cs ro ds p > >> mounted fstype > >> 0:vmstore StandAlone Secondary/Unknown Outdated/DUnknown r---- > >> 1:mount1 StandAlone Secondary/Unknown Outdated/DUnknown r---- > >> 2:mount2 StandAlone Secondary/Unknown Outdated/DUnknown r---- > >> # drbdadm cstate all > >> StandAlone > >> StandAlone > >> StandAlone > >> > >> After manually issuing "drbdadm connect all" on node2 the rest of > >> the resources eventually started (several minutes later) on node1: > >> Online: [ node2 node1 ] > >> > >> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] > >> Masters: [ node1 ] > >> Slaves: [ node2 ] > >> Resource Group: g_vm > >> p_fs_vmstore (ocf::heartbeat:Filesystem): Started node1 > >> p_vm (ocf::heartbeat:VirtualDomain): Started node1 > >> Clone Set: cl_daemons [g_daemons] > >> Started: [ node2 node1 ] > >> Stopped: [ g_daemons:2 ] > >> Clone Set: cl_sysadmin_notify [p_sysadmin_notify] > >> Started: [ node2 node1 ] > >> Stopped: [ p_sysadmin_notify:2 ] > >> stonith-node1 (stonith:external/tripplitepdu): Started node2 > >> stonith-node2 (stonith:external/tripplitepdu): Started node1 > >> Clone Set: cl_ping [p_ping] > >> Started: [ node2 node1 ] > >> Stopped: [ p_ping:2 ] > >> > >> The DRBD devices on node1 were all UpToDate, so it doesn't seem > >> right that it would need to wait for node2 to be connected before > >> it > >> could continue promoting additional resources. I then restarted > >> heartbeat on node2 to see if it would automatically connect the > >> DRBD > >> devices this time. After restarting it, the DRBD devices are not > >> even configured: > >> # service drbd status > >> drbd driver loaded OK; device status: > >> version: 8.3.7 (api:88/proto:86-91) > >> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by > >> root@webapps2host, 2012-02-02 12:29:26 > >> m:res cs ro ds p mounted fstype > >> 0:vmstore Unconfigured > >> 1:mount1 Unconfigured > >> 2:mount2 Unconfigured > >> > >> Looking at the log I found this part about the drbd primitives: > >> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[2] on > >> p_drbd_vmstore:1 for client 10705: pid 11065 exited with return > >> code 7 > >> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM > >> operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11, > >> confirmed=true) not running > >> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[4] on > >> p_drbd_mount2:1 for client 10705: pid 11069 exited with return > >> code 7 > >> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM > >> operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=12, > >> confirmed=true) not running > >> Mar 30 11:10:32 node2 lrmd: [10702]: info: operation monitor[3] on > >> p_drbd_mount1:1 for client 10705: pid 11066 exited with return > >> code 7 > >> Mar 30 11:10:32 node2 crmd: [10705]: info: process_lrm_event: LRM > >> operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=13, > >> confirmed=true) not running > >> > >> I am not sure what exit code 7 is - is it possible to manually run > >> the monitor code or somehow obtain more debug about this? Here is > >> the complete log after restarting heartbeat on node2: > >> http://pastebin.com/KsHKi3GW > >> > >> Thanks, > >> > >> Andrew > >> > >> > > ------------------------------------------------------------------------ > >> *From: *"emmanuel segura" <emi2f...@gmail.com > >> <mailto:emi2f...@gmail.com>> > >> *To: *"The Pacemaker cluster resource manager" > >> <pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org>> > >> *Sent: *Friday, March 30, 2012 10:26:48 AM > >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources > >> to > >> master on failover > >> > >> I think this constrain it's wrong > >> ================================================== > >> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master > >> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm > >> =================================================== > >> > >> change to > >> ====================================================== > >> colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master > >> ms_drbd_mount1:Master ms_drbd_mount2:Master > >> ======================================================= > >> > >> Il giorno 30 marzo 2012 17:16, Andrew Martin <amar...@xes-inc.com > >> <mailto:amar...@xes-inc.com>> ha scritto: > >> > >> Hi Emmanuel, > >> > >> Here is the output of crm configure show: > >> http://pastebin.com/NA1fZ8dL > >> > >> Thanks, > >> > >> Andrew > >> > >> > > ------------------------------------------------------------------------ > >> *From: *"emmanuel segura" <emi2f...@gmail.com > >> <mailto:emi2f...@gmail.com>> > >> *To: *"The Pacemaker cluster resource manager" > >> <pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org>> > >> *Sent: *Friday, March 30, 2012 9:43:45 AM > >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources > >> to master on failover > >> > >> can you show me? > >> > >> crm configure show > >> > >> Il giorno 30 marzo 2012 16:10, Andrew Martin > >> <amar...@xes-inc.com <mailto:amar...@xes-inc.com>> ha scritto: > >> > >> Hi Andreas, > >> > >> Here is a copy of my complete CIB: > >> http://pastebin.com/v5wHVFuy > >> > >> I'll work on generating a report using crm_report as well. > >> > >> Thanks, > >> > >> Andrew > >> > >> > > ------------------------------------------------------------------------ > >> *From: *"Andreas Kurz" <andr...@hastexo.com > >> <mailto:andr...@hastexo.com>> > >> *To: *pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org> > >> *Sent: *Friday, March 30, 2012 4:41:16 AM > >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD > >> resources to master on failover > >> > >> On 03/28/2012 04:56 PM, Andrew Martin wrote: > >> > Hi Andreas, > >> > > >> > I disabled the DRBD init script and then restarted the > >> slave node > >> > (node2). After it came back up, DRBD did not start: > >> > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): > >> pending > >> > Online: [ node2 node1 ] > >> > > >> > Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > >> > Masters: [ node1 ] > >> > Stopped: [ p_drbd_vmstore:1 ] > >> > Master/Slave Set: ms_drbd_mount1 [p_drbd_tools] > >> > Masters: [ node1 ] > >> > Stopped: [ p_drbd_mount1:1 ] > >> > Master/Slave Set: ms_drbd_mount2 [p_drbdmount2] > >> > Masters: [ node1 ] > >> > Stopped: [ p_drbd_mount2:1 ] > >> > ... > >> > > >> > root@node2:~# service drbd status > >> > drbd not loaded > >> > >> Yes, expected unless Pacemaker starts DRBD > >> > >> > > >> > Is there something else I need to change in the CIB to > >> ensure that DRBD > >> > is started? All of my DRBD devices are configured like this: > >> > primitive p_drbd_mount2 ocf:linbit:drbd \ > >> > params drbd_resource="mount2" \ > >> > op monitor interval="15" role="Master" \ > >> > op monitor interval="30" role="Slave" > >> > ms ms_drbd_mount2 p_drbd_mount2 \ > >> > meta master-max="1" master-node-max="1" > > clone-max="2" > >> > clone-node-max="1" notify="true" > >> > >> That should be enough ... unable to say more without seeing > >> the complete > >> configuration ... too much fragments of information ;-) > >> > >> Please provide (e.g. pastebin) your complete cib (cibadmin > >> -Q) when > >> cluster is in that state ... or even better create a > >> crm_report archive > >> > >> > > >> > Here is the output from the syslog (grep -i drbd > >> /var/log/syslog): > >> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: > >> Performing > >> > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > >> > op=p_drbd_vmstore:1_monitor_0 ) > >> > Mar 28 09:24:47 node2 lrmd: [3210]: info: > >> rsc:p_drbd_vmstore:1 probe[2] > >> > (pid 3455) > >> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: > >> Performing > >> > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > >> > op=p_drbd_mount1:1_monitor_0 ) > >> > Mar 28 09:24:48 node2 lrmd: [3210]: info: > >> rsc:p_drbd_mount1:1 probe[3] > >> > (pid 3456) > >> > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: > >> Performing > >> > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > >> > op=p_drbd_mount2:1_monitor_0 ) > >> > Mar 28 09:24:48 node2 lrmd: [3210]: info: > >> rsc:p_drbd_mount2:1 probe[4] > >> > (pid 3457) > >> > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: > >> Couldn't find > >> > device [/dev/drbd0]. Expected /dev/??? to exist > >> > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked: > >> > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l > > reboot -D > >> > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked: > >> > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l > > reboot -D > >> > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked: > >> > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l > > reboot -D > >> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation > >> monitor[4] on > >> > p_drbd_mount2:1 for client 3213: pid 3457 exited with > >> return code 7 > >> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation > >> monitor[2] on > >> > p_drbd_vmstore:1 for client 3213: pid 3455 exited with > >> return code 7 > >> > Mar 28 09:24:48 node2 crmd: [3213]: info: > >> process_lrm_event: LRM > >> > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, > >> cib-update=10, > >> > confirmed=true) not running > >> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation > >> monitor[3] on > >> > p_drbd_mount1:1 for client 3213: pid 3456 exited with > >> return code 7 > >> > Mar 28 09:24:48 node2 crmd: [3213]: info: > >> process_lrm_event: LRM > >> > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, > >> cib-update=11, > >> > confirmed=true) not running > >> > Mar 28 09:24:48 node2 crmd: [3213]: info: > >> process_lrm_event: LRM > >> > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, > >> cib-update=12, > >> > confirmed=true) not running > >> > >> No errors, just probing ... so for any reason Pacemaker does > >> not like to > >> start it ... use crm_simulate to find out why ... or provide > >> information > >> as requested above. > >> > >> Regards, > >> Andreas > >> > >> -- > >> Need help with Pacemaker? > >> http://www.hastexo.com/now > >> > >> > > >> > Thanks, > >> > > >> > Andrew > >> > > >> > > >> > > ------------------------------------------------------------------------ > >> > *From: *"Andreas Kurz" <andr...@hastexo.com > >> <mailto:andr...@hastexo.com>> > >> > *To: *pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org> > >> > *Sent: *Wednesday, March 28, 2012 9:03:06 AM > >> > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD > >> resources to > >> > master on failover > >> > > >> > On 03/28/2012 03:47 PM, Andrew Martin wrote: > >> >> Hi Andreas, > >> >> > >> >>> hmm ... what is that fence-peer script doing? If you > >> want to use > >> >>> resource-level fencing with the help of dopd, activate the > >> >>> drbd-peer-outdater script in the line above ... and > >> double check if the > >> >>> path is correct > >> >> fence-peer is just a wrapper for drbd-peer-outdater that > >> does some > >> >> additional logging. In my testing dopd has been working > > well. > >> > > >> > I see > >> > > >> >> > >> >>>> I am thinking of making the following changes to the > >> CIB (as per the > >> >>>> official DRBD > >> >>>> guide > >> >> > >> > > >> > > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) > >> in > >> >>>> order to add the DRBD lsb service and require that it > >> start before the > >> >>>> ocf:linbit:drbd resources. Does this look correct? > >> >>> > >> >>> Where did you read that? No, deactivate the startup of > >> DRBD on system > >> >>> boot and let Pacemaker manage it completely. > >> >>> > >> >>>> primitive p_drbd-init lsb:drbd op monitor interval="30" > >> >>>> colocation c_drbd_together inf: > >> >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master > >> >>>> ms_drbd_mount2:Master > >> >>>> order drbd_init_first inf: ms_drbd_vmstore:promote > >> >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote > >> p_drbd-init:start > >> >>>> > >> >>>> This doesn't seem to require that drbd be also running > >> on the node where > >> >>>> the ocf:linbit:drbd resources are slave (which it would > >> need to do to be > >> >>>> a DRBD SyncTarget) - how can I ensure that drbd is > >> running everywhere? > >> >>>> (clone cl_drbd p_drbd-init ?) > >> >>> > >> >>> This is really not needed. > >> >> I was following the official DRBD Users Guide: > >> >> > >> > > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html > >> >> > >> >> If I am understanding your previous message correctly, I > >> do not need to > >> >> add a lsb primitive for the drbd daemon? It will be > >> >> started/stopped/managed automatically by my > >> ocf:linbit:drbd resources > >> >> (and I can remove the /etc/rc* symlinks)? > >> > > >> > Yes, you don't need that LSB script when using Pacemaker > >> and should not > >> > let init start it. > >> > > >> > Regards, > >> > Andreas > >> > > >> > -- > >> > Need help with Pacemaker? > >> > http://www.hastexo.com/now > >> > > >> >> > >> >> Thanks, > >> >> > >> >> Andrew > >> >> > >> >> > >> > > ------------------------------------------------------------------------ > >> >> *From: *"Andreas Kurz" <andr...@hastexo.com > >> <mailto:andr...@hastexo.com> <mailto:andr...@hastexo.com > >> <mailto:andr...@hastexo.com>>> > >> >> *To: *pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org> > >> <mailto:pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org>> > >> >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM > >> >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD > >> resources to > >> >> master on failover > >> >> > >> >> On 03/28/2012 12:13 AM, Andrew Martin wrote: > >> >>> Hi Andreas, > >> >>> > >> >>> Thanks, I've updated the colocation rule to be in the > >> correct order. I > >> >>> also enabled the STONITH resource (this was temporarily > >> disabled before > >> >>> for some additional testing). DRBD has its own network > >> connection over > >> >>> the br1 interface (192.168.5.0/24 > >> <http://192.168.5.0/24> network), a direct crossover cable > >> >>> between node1 and node2: > >> >>> global { usage-count no; } > >> >>> common { > >> >>> syncer { rate 110M; } > >> >>> } > >> >>> resource vmstore { > >> >>> protocol C; > >> >>> startup { > >> >>> wfc-timeout 15; > >> >>> degr-wfc-timeout 60; > >> >>> } > >> >>> handlers { > >> >>> #fence-peer > >> "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; > >> >>> fence-peer "/usr/local/bin/fence-peer"; > >> >> > >> >> hmm ... what is that fence-peer script doing? If you want > >> to use > >> >> resource-level fencing with the help of dopd, activate the > >> >> drbd-peer-outdater script in the line above ... and > >> double check if the > >> >> path is correct > >> >> > >> >>> split-brain > >> "/usr/lib/drbd/notify-split-brain.sh > >> >>> m...@example.com <mailto:m...@example.com> > >> <mailto:m...@example.com <mailto:m...@example.com>>"; > >> >>> } > >> >>> net { > >> >>> after-sb-0pri discard-zero-changes; > >> >>> after-sb-1pri discard-secondary; > >> >>> after-sb-2pri disconnect; > >> >>> cram-hmac-alg md5; > >> >>> shared-secret "xxxxx"; > >> >>> } > >> >>> disk { > >> >>> fencing resource-only; > >> >>> } > >> >>> on node1 { > >> >>> device /dev/drbd0; > >> >>> disk /dev/sdb1; > >> >>> address 192.168.5.10:7787 > >> <http://192.168.5.10:7787>; > >> >>> meta-disk internal; > >> >>> } > >> >>> on node2 { > >> >>> device /dev/drbd0; > >> >>> disk /dev/sdf1; > >> >>> address 192.168.5.11:7787 > >> <http://192.168.5.11:7787>; > >> >>> meta-disk internal; > >> >>> } > >> >>> } > >> >>> # and similar for mount1 and mount2 > >> >>> > >> >>> Also, here is my ha.cf <http://ha.cf>. It uses both the > >> direct link between the nodes > >> >>> (br1) and the shared LAN network on br0 for communicating: > >> >>> autojoin none > >> >>> mcast br0 239.0.0.43 694 1 0 > >> >>> bcast br1 > >> >>> warntime 5 > >> >>> deadtime 15 > >> >>> initdead 60 > >> >>> keepalive 2 > >> >>> node node1 > >> >>> node node2 > >> >>> node quorumnode > >> >>> crm respawn > >> >>> respawn hacluster /usr/lib/heartbeat/dopd > >> >>> apiauth dopd gid=haclient uid=hacluster > >> >>> > >> >>> I am thinking of making the following changes to the CIB > >> (as per the > >> >>> official DRBD > >> >>> guide > >> >> > >> > > >> > > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) > >> in > >> >>> order to add the DRBD lsb service and require that it > >> start before the > >> >>> ocf:linbit:drbd resources. Does this look correct? > >> >> > >> >> Where did you read that? No, deactivate the startup of > >> DRBD on system > >> >> boot and let Pacemaker manage it completely. > >> >> > >> >>> primitive p_drbd-init lsb:drbd op monitor interval="30" > >> >>> colocation c_drbd_together inf: > >> >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master > >> >>> ms_drbd_mount2:Master > >> >>> order drbd_init_first inf: ms_drbd_vmstore:promote > >> >>> ms_drbd_mount1:promote ms_drbd_mount2:promote > >> p_drbd-init:start > >> >>> > >> >>> This doesn't seem to require that drbd be also running > >> on the node where > >> >>> the ocf:linbit:drbd resources are slave (which it would > >> need to do to be > >> >>> a DRBD SyncTarget) - how can I ensure that drbd is > >> running everywhere? > >> >>> (clone cl_drbd p_drbd-init ?) > >> >> > >> >> This is really not needed. > >> >> > >> >> Regards, > >> >> Andreas > >> >> > >> >> -- > >> >> Need help with Pacemaker? > >> >> http://www.hastexo.com/now > >> >> > >> >>> > >> >>> Thanks, > >> >>> > >> >>> Andrew > >> >>> > >> > > ------------------------------------------------------------------------ > >> >>> *From: *"Andreas Kurz" <andr...@hastexo.com > >> <mailto:andr...@hastexo.com> <mailto:andr...@hastexo.com > >> <mailto:andr...@hastexo.com>>> > >> >>> *To: *pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org> > >> > <mailto:*pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org>> > >> >>> *Sent: *Monday, March 26, 2012 5:56:22 PM > >> >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD > >> resources to > >> >>> master on failover > >> >>> > >> >>> On 03/24/2012 08:15 PM, Andrew Martin wrote: > >> >>>> Hi Andreas, > >> >>>> > >> >>>> My complete cluster configuration is as follows: > >> >>>> ============ > >> >>>> Last updated: Sat Mar 24 13:51:55 2012 > >> >>>> Last change: Sat Mar 24 13:41:55 2012 > >> >>>> Stack: Heartbeat > >> >>>> Current DC: node2 > >> (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition > >> >>>> with quorum > >> >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c > >> >>>> 3 Nodes configured, unknown expected votes > >> >>>> 19 Resources configured. > >> >>>> ============ > >> >>>> > >> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): > >> OFFLINE > >> > (standby) > >> >>>> Online: [ node2 node1 ] > >> >>>> > >> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > >> >>>> Masters: [ node2 ] > >> >>>> Slaves: [ node1 ] > >> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] > >> >>>> Masters: [ node2 ] > >> >>>> Slaves: [ node1 ] > >> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] > >> >>>> Masters: [ node2 ] > >> >>>> Slaves: [ node1 ] > >> >>>> Resource Group: g_vm > >> >>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started > > node2 > >> >>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2 > >> >>>> Clone Set: cl_daemons [g_daemons] > >> >>>> Started: [ node2 node1 ] > >> >>>> Stopped: [ g_daemons:2 ] > >> >>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify] > >> >>>> Started: [ node2 node1 ] > >> >>>> Stopped: [ p_sysadmin_notify:2 ] > >> >>>> stonith-node1(stonith:external/tripplitepdu):Started > > node2 > >> >>>> stonith-node2(stonith:external/tripplitepdu):Started > > node1 > >> >>>> Clone Set: cl_ping [p_ping] > >> >>>> Started: [ node2 node1 ] > >> >>>> Stopped: [ p_ping:2 ] > >> >>>> > >> >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \ > >> >>>> attributes standby="off" > >> >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \ > >> >>>> attributes standby="off" > >> >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" > >> quorumnode \ > >> >>>> attributes standby="on" > >> >>>> primitive p_drbd_mount2 ocf:linbit:drbd \ > >> >>>> params drbd_resource="mount2" \ > >> >>>> op monitor interval="15" role="Master" \ > >> >>>> op monitor interval="30" role="Slave" > >> >>>> primitive p_drbd_mount1 ocf:linbit:drbd \ > >> >>>> params drbd_resource="mount1" \ > >> >>>> op monitor interval="15" role="Master" \ > >> >>>> op monitor interval="30" role="Slave" > >> >>>> primitive p_drbd_vmstore ocf:linbit:drbd \ > >> >>>> params drbd_resource="vmstore" \ > >> >>>> op monitor interval="15" role="Master" \ > >> >>>> op monitor interval="30" role="Slave" > >> >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ > >> >>>> params device="/dev/drbd0" directory="/vmstore" > >> fstype="ext4" \ > >> >>>> op start interval="0" timeout="60s" \ > >> >>>> op stop interval="0" timeout="60s" \ > >> >>>> op monitor interval="20s" timeout="40s" > >> >>>> primitive p_libvirt-bin upstart:libvirt-bin \ > >> >>>> op monitor interval="30" > >> >>>> primitive p_ping ocf:pacemaker:ping \ > >> >>>> params name="p_ping" host_list="192.168.1.10 > >> 192.168.1.11" > >> >>>> multiplier="1000" \ > >> >>>> op monitor interval="20s" > >> >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ > >> >>>> params email="m...@example.com > >> <mailto:m...@example.com> <mailto:m...@example.com > >> <mailto:m...@example.com>>" \ > >> >>>> params subject="Pacemaker Change" \ > >> >>>> op start interval="0" timeout="10" \ > >> >>>> op stop interval="0" timeout="10" \ > >> >>>> op monitor interval="10" timeout="10" > >> >>>> primitive p_vm ocf:heartbeat:VirtualDomain \ > >> >>>> params config="/vmstore/config/vm.xml" \ > >> >>>> meta allow-migrate="false" \ > >> >>>> op start interval="0" timeout="120s" \ > >> >>>> op stop interval="0" timeout="120s" \ > >> >>>> op monitor interval="10" timeout="30" > >> >>>> primitive stonith-node1 stonith:external/tripplitepdu \ > >> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="1" > >> pdu_username="xxx" > >> >>>> pdu_password="xxx" hostname_to_stonith="node1" > >> >>>> primitive stonith-node2 stonith:external/tripplitepdu \ > >> >>>> params pdu_ipaddr="192.168.1.12" pdu_port="2" > >> pdu_username="xxx" > >> >>>> pdu_password="xxx" hostname_to_stonith="node2" > >> >>>> group g_daemons p_libvirt-bin > >> >>>> group g_vm p_fs_vmstore p_vm > >> >>>> ms ms_drbd_mount2 p_drbd_mount2 \ > >> >>>> meta master-max="1" master-node-max="1" > >> clone-max="2" > >> >>>> clone-node-max="1" notify="true" > >> >>>> ms ms_drbd_mount1 p_drbd_mount1 \ > >> >>>> meta master-max="1" master-node-max="1" > >> clone-max="2" > >> >>>> clone-node-max="1" notify="true" > >> >>>> ms ms_drbd_vmstore p_drbd_vmstore \ > >> >>>> meta master-max="1" master-node-max="1" > >> clone-max="2" > >> >>>> clone-node-max="1" notify="true" > >> >>>> clone cl_daemons g_daemons > >> >>>> clone cl_ping p_ping \ > >> >>>> meta interleave="true" > >> >>>> clone cl_sysadmin_notify p_sysadmin_notify > >> >>>> location l-st-node1 stonith-node1 -inf: node1 > >> >>>> location l-st-node2 stonith-node2 -inf: node2 > >> >>>> location l_run_on_most_connected p_vm \ > >> >>>> rule $id="l_run_on_most_connected-rule" p_ping: > >> defined p_ping > >> >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master > >> >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm > >> >>> > >> >>> As Emmanuel already said, g_vm has to be in the first > >> place in this > >> >>> collocation constraint .... g_vm must be colocated with > >> the drbd masters. > >> >>> > >> >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote > >> ms_drbd_mount1:promote > >> >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start > >> >>>> property $id="cib-bootstrap-options" \ > >> >>>> > >> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > >> >>>> cluster-infrastructure="Heartbeat" \ > >> >>>> stonith-enabled="false" \ > >> >>>> no-quorum-policy="stop" \ > >> >>>> last-lrm-refresh="1332539900" \ > >> >>>> cluster-recheck-interval="5m" \ > >> >>>> crmd-integration-timeout="3m" \ > >> >>>> shutdown-escalation="5m" > >> >>>> > >> >>>> The STONITH plugin is a custom plugin I wrote for the > >> Tripp-Lite > >> >>>> PDUMH20ATNET that I'm using as the STONITH device: > >> >>>> > >> > > http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf > >> >>> > >> >>> And why don't using it? .... stonith-enabled="false" > >> >>> > >> >>>> > >> >>>> As you can see, I left the DRBD service to be started > >> by the operating > >> >>>> system (as an lsb script at boot time) however > >> Pacemaker controls > >> >>>> actually bringing up/taking down the individual DRBD > >> devices. > >> >>> > >> >>> Don't start drbd on system boot, give Pacemaker the full > >> control. > >> >>> > >> >>> The > >> >>>> behavior I observe is as follows: I issue "crm resource > >> migrate p_vm" on > >> >>>> node1 and failover successfully to node2. During this > >> time, node2 fences > >> >>>> node1's DRBD devices (using dopd) and marks them as > >> Outdated. Meanwhile > >> >>>> node2's DRBD devices are UpToDate. I then shutdown both > >> nodes and then > >> >>>> bring them back up. They reconnect to the cluster (with > >> quorum), and > >> >>>> node1's DRBD devices are still Outdated as expected and > >> node2's DRBD > >> >>>> devices are still UpToDate, as expected. At this point, > >> DRBD starts on > >> >>>> both nodes, however node2 will not set DRBD as master: > >> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): > >> OFFLINE > >> > (standby) > >> >>>> Online: [ node2 node1 ] > >> >>>> > >> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > >> >>>> Slaves: [ node1 node2 ] > >> >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] > >> >>>> Slaves: [ node1 node 2 ] > >> >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] > >> >>>> Slaves: [ node1 node2 ] > >> >>> > >> >>> There should really be no interruption of the drbd > >> replication on vm > >> >>> migration that activates the dopd ... drbd has its own > >> direct network > >> >>> connection? > >> >>> > >> >>> Please share your ha.cf <http://ha.cf> file and your > >> drbd configuration. Watch out for > >> >>> drbd messages in your kernel log file, that should give > >> you additional > >> >>> information when/why the drbd connection was lost. > >> >>> > >> >>> Regards, > >> >>> Andreas > >> >>> > >> >>> -- > >> >>> Need help with Pacemaker? > >> >>> http://www.hastexo.com/now > >> >>> > >> >>>> > >> >>>> I am having trouble sorting through the logging > >> information because > >> >>>> there is so much of it in /var/log/daemon.log, but I > >> can't find an > >> >>>> error message printed about why it will not promote > >> node2. At this point > >> >>>> the DRBD devices are as follows: > >> >>>> node2: cstate = WFConnection dstate=UpToDate > >> >>>> node1: cstate = StandAlone dstate=Outdated > >> >>>> > >> >>>> I don't see any reason why node2 can't become DRBD > >> master, or am I > >> >>>> missing something? If I do "drbdadm connect all" on > >> node1, then the > >> >>>> cstate on both nodes changes to "Connected" and node2 > >> immediately > >> >>>> promotes the DRBD resources to master. Any ideas on why > >> I'm observing > >> >>>> this incorrect behavior? > >> >>>> > >> >>>> Any tips on how I can better filter through the > >> pacemaker/heartbeat logs > >> >>>> or how to get additional useful debug information? > >> >>>> > >> >>>> Thanks, > >> >>>> > >> >>>> Andrew > >> >>>> > >> >>>> > >> > > ------------------------------------------------------------------------ > >> >>>> *From: *"Andreas Kurz" <andr...@hastexo.com > >> <mailto:andr...@hastexo.com> > >> > <mailto:andr...@hastexo.com <mailto:andr...@hastexo.com>>> > >> >>>> *To: *pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org> > >> >> <mailto:*pacemaker@oss.clusterlabs.org > >> <mailto:pacemaker@oss.clusterlabs.org>> > >> >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM > >> >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD > >> resources to > >> >>>> master on failover > >> >>>> > >> >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote: > >> >>>>> Hello, > >> >>>>> > >> >>>>> Recently I finished configuring a two-node cluster > >> with pacemaker 1.1.6 > >> >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. > >> This cluster > >> > includes > >> >>>>> the following resources: > >> >>>>> - primitives for DRBD storage devices > >> >>>>> - primitives for mounting the filesystem on the DRBD > >> storage > >> >>>>> - primitives for some mount binds > >> >>>>> - primitive for starting apache > >> >>>>> - primitives for starting samba and nfs servers > >> (following instructions > >> >>>>> here > >> <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>) > >> >>>>> - primitives for exporting nfs shares > >> (ocf:heartbeat:exportfs) > >> >>>> > >> >>>> not enough information ... please share at least your > >> complete cluster > >> >>>> configuration > >> >>>> > >> >>>> Regards, > >> >>>> Andreas > >> >>>> > >> >>>> -- > >> >>>> Need help with Pacemaker? > >> >>>> http://www.hastexo.com/now > >> >>>> > >> >>>>> > >> >>>>> Perhaps this is best described through the output of > >> crm_mon: > >> >>>>> Online: [ node1 node2 ] > >> >>>>> > >> >>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] > >> (unmanaged) > >> >>>>> p_drbd_mount1:0 (ocf::linbit:drbd): > >> Started node2 > >> >>> (unmanaged) > >> >>>>> p_drbd_mount1:1 (ocf::linbit:drbd): > >> Started node1 > >> >>>>> (unmanaged) FAILED > >> >>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] > >> >>>>> p_drbd_mount2:0 (ocf::linbit:drbd): > >> Master node1 > >> >>>>> (unmanaged) FAILED > >> >>>>> Slaves: [ node2 ] > >> >>>>> Resource Group: g_core > >> >>>>> p_fs_mount1 (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_fs_mount2 (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): > >> Started node1 > >> >>>>> Resource Group: g_apache > >> >>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_fs_varwww (ocf::heartbeat:Filesystem): > >> Started node1 > >> >>>>> p_apache (ocf::heartbeat:apache): > >> Started node1 > >> >>>>> Resource Group: g_fileservers > >> >>>>> p_lsb_smb (lsb:smbd): Started node1 > >> >>>>> p_lsb_nmb (lsb:nmbd): Started node1 > >> >>>>> p_lsb_nfsserver (lsb:nfs-kernel-server): > >> Started node1 > >> >>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs): > >> Started node1 > >> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): > >> Started > >> > node1 > >> >>>>> > >> >>>>> I have read through the Pacemaker Explained > >> >>>>> > >> >>>> > >> >>> > >> > > >> > > <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained> > >> >>>>> documentation, however could not find a way to further > >> debug these > >> >>>>> problems. First, I put node1 into standby mode to > >> attempt failover to > >> >>>>> the other node (node2). Node2 appeared to start the > >> transition to > >> >>>>> master, however it failed to promote the DRBD > >> resources to master (the > >> >>>>> first step). I have attached a copy of this session in > >> commands.log and > >> >>>>> additional excerpts from /var/log/syslog during > >> important steps. I have > >> >>>>> attempted everything I can think of to try and start > >> the DRBD resource > >> >>>>> (e.g. start/stop/promote/manage/cleanup under crm > >> resource, restarting > >> >>>>> heartbeat) but cannot bring it out of the slave state. > >> However, if > >> > I set > >> >>>>> it to unmanaged and then run drbdadm primary all in > >> the terminal, > >> >>>>> pacemaker is satisfied and continues starting the rest > >> of the > >> > resources. > >> >>>>> It then failed when attempting to mount the filesystem > >> for mount2, the > >> >>>>> p_fs_mount2 resource. I attempted to mount the > >> filesystem myself > >> > and was > >> >>>>> successful. I then unmounted it and ran cleanup on > >> p_fs_mount2 and then > >> >>>>> it mounted. The rest of the resources started as > >> expected until the > >> >>>>> p_exportfs_mount2 resource, which failed as follows: > >> >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): > >> started node2 > >> >>>>> (unmanaged) FAILED > >> >>>>> > >> >>>>> I ran cleanup on this and it started, however when > >> running this test > >> >>>>> earlier today no command could successfully start this > >> exportfs > >> >> resource. > >> >>>>> > >> >>>>> How can I configure pacemaker to better resolve these > >> problems and be > >> >>>>> able to bring the node up successfully on its own? > >> What can I check to > >> >>>>> determine why these failures are occuring? > >> /var/log/syslog did not seem > >> >>>>> to contain very much useful information regarding why > >> the failures > >> >>>> occurred. > >> >>>>> > >> >>>>> Thanks, > >> >>>>> > >> >>>>> Andrew > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> This body part will be downloaded on demand. > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> <mailto:Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org>> > >> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >>>> > >> >>>> Project Home: http://www.clusterlabs.org > >> >>>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >>>> Bugs: http://bugs.clusterlabs.org > >> >>>> > >> >>>> > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> <mailto:Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org>> > >> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >>>> > >> >>>> Project Home: http://www.clusterlabs.org > >> >>>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >>>> Bugs: http://bugs.clusterlabs.org > >> >>> > >> >>> > >> >>> > >> >>> _______________________________________________ > >> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> <mailto:Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org>> > >> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >>> > >> >>> Project Home: http://www.clusterlabs.org > >> >>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >>> Bugs: http://bugs.clusterlabs.org > >> >>> > >> >>> > >> >>> > >> >>> _______________________________________________ > >> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> <mailto:Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org>> > >> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >>> > >> >>> Project Home: http://www.clusterlabs.org > >> >>> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >>> Bugs: http://bugs.clusterlabs.org > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> <mailto:Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org>> > >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >> > >> >> Project Home: http://www.clusterlabs.org > >> >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >> Bugs: http://bugs.clusterlabs.org > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >> > >> >> Project Home: http://www.clusterlabs.org > >> >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >> Bugs: http://bugs.clusterlabs.org > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs: http://bugs.clusterlabs.org > >> > > >> > > >> > > >> > _______________________________________________ > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> > >> -- > >> esta es mi vida e me la vivo hasta que dios quiera > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> > >> -- > >> esta es mi vida e me la vivo hasta que dios quiera > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> <mailto:Pacemaker@oss.clusterlabs.org> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> > >> -- > >> esta es mi vida e me la vivo hasta que dios quiera > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > > -- > > Need help with Pacemaker? > > http://www.hastexo.com/now > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org