On 03/28/2012 04:56 PM, Andrew Martin wrote: > Hi Andreas, > > I disabled the DRBD init script and then restarted the slave node > (node2). After it came back up, DRBD did not start: > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending > Online: [ node2 node1 ] > > Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > Masters: [ node1 ] > Stopped: [ p_drbd_vmstore:1 ] > Master/Slave Set: ms_drbd_mount1 [p_drbd_tools] > Masters: [ node1 ] > Stopped: [ p_drbd_mount1:1 ] > Master/Slave Set: ms_drbd_mount2 [p_drbdmount2] > Masters: [ node1 ] > Stopped: [ p_drbd_mount2:1 ] > ... > > root@node2:~# service drbd status > drbd not loaded
Yes, expected unless Pacemaker starts DRBD > > Is there something else I need to change in the CIB to ensure that DRBD > is started? All of my DRBD devices are configured like this: > primitive p_drbd_mount2 ocf:linbit:drbd \ > params drbd_resource="mount2" \ > op monitor interval="15" role="Master" \ > op monitor interval="30" role="Slave" > ms ms_drbd_mount2 p_drbd_mount2 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" That should be enough ... unable to say more without seeing the complete configuration ... too much fragments of information ;-) Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when cluster is in that state ... or even better create a crm_report archive > > Here is the output from the syslog (grep -i drbd /var/log/syslog): > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_vmstore:1_monitor_0 ) > Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2] > (pid 3455) > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_mount1:1_monitor_0 ) > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3] > (pid 3456) > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_mount2:1_monitor_0 ) > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4] > (pid 3457) > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find > device [/dev/drbd0]. Expected /dev/??? to exist > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on > p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7 > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on > p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7 > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10, > confirmed=true) not running > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on > p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7 > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11, > confirmed=true) not running > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12, > confirmed=true) not running No errors, just probing ... so for any reason Pacemaker does not like to start it ... use crm_simulate to find out why ... or provide information as requested above. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Thanks, > > Andrew > > ------------------------------------------------------------------------ > *From: *"Andreas Kurz" <andr...@hastexo.com> > *To: *pacemaker@oss.clusterlabs.org > *Sent: *Wednesday, March 28, 2012 9:03:06 AM > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to > master on failover > > On 03/28/2012 03:47 PM, Andrew Martin wrote: >> Hi Andreas, >> >>> hmm ... what is that fence-peer script doing? If you want to use >>> resource-level fencing with the help of dopd, activate the >>> drbd-peer-outdater script in the line above ... and double check if the >>> path is correct >> fence-peer is just a wrapper for drbd-peer-outdater that does some >> additional logging. In my testing dopd has been working well. > > I see > >> >>>> I am thinking of making the following changes to the CIB (as per the >>>> official DRBD >>>> guide >> > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in >>>> order to add the DRBD lsb service and require that it start before the >>>> ocf:linbit:drbd resources. Does this look correct? >>> >>> Where did you read that? No, deactivate the startup of DRBD on system >>> boot and let Pacemaker manage it completely. >>> >>>> primitive p_drbd-init lsb:drbd op monitor interval="30" >>>> colocation c_drbd_together inf: >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master >>>> ms_drbd_mount2:Master >>>> order drbd_init_first inf: ms_drbd_vmstore:promote >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start >>>> >>>> This doesn't seem to require that drbd be also running on the node where >>>> the ocf:linbit:drbd resources are slave (which it would need to do to be >>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? >>>> (clone cl_drbd p_drbd-init ?) >>> >>> This is really not needed. >> I was following the official DRBD Users Guide: >> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html >> >> If I am understanding your previous message correctly, I do not need to >> add a lsb primitive for the drbd daemon? It will be >> started/stopped/managed automatically by my ocf:linbit:drbd resources >> (and I can remove the /etc/rc* symlinks)? > > Yes, you don't need that LSB script when using Pacemaker and should not > let init start it. > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > >> >> Thanks, >> >> Andrew >> >> ------------------------------------------------------------------------ >> *From: *"Andreas Kurz" <andr...@hastexo.com <mailto:andr...@hastexo.com>> >> *To: *pacemaker@oss.clusterlabs.org <mailto:pacemaker@oss.clusterlabs.org> >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to >> master on failover >> >> On 03/28/2012 12:13 AM, Andrew Martin wrote: >>> Hi Andreas, >>> >>> Thanks, I've updated the colocation rule to be in the correct order. I >>> also enabled the STONITH resource (this was temporarily disabled before >>> for some additional testing). DRBD has its own network connection over >>> the br1 interface (192.168.5.0/24 network), a direct crossover cable >>> between node1 and node2: >>> global { usage-count no; } >>> common { >>> syncer { rate 110M; } >>> } >>> resource vmstore { >>> protocol C; >>> startup { >>> wfc-timeout 15; >>> degr-wfc-timeout 60; >>> } >>> handlers { >>> #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; >>> fence-peer "/usr/local/bin/fence-peer"; >> >> hmm ... what is that fence-peer script doing? If you want to use >> resource-level fencing with the help of dopd, activate the >> drbd-peer-outdater script in the line above ... and double check if the >> path is correct >> >>> split-brain "/usr/lib/drbd/notify-split-brain.sh >>> m...@example.com <mailto:m...@example.com>"; >>> } >>> net { >>> after-sb-0pri discard-zero-changes; >>> after-sb-1pri discard-secondary; >>> after-sb-2pri disconnect; >>> cram-hmac-alg md5; >>> shared-secret "xxxxx"; >>> } >>> disk { >>> fencing resource-only; >>> } >>> on node1 { >>> device /dev/drbd0; >>> disk /dev/sdb1; >>> address 192.168.5.10:7787; >>> meta-disk internal; >>> } >>> on node2 { >>> device /dev/drbd0; >>> disk /dev/sdf1; >>> address 192.168.5.11:7787; >>> meta-disk internal; >>> } >>> } >>> # and similar for mount1 and mount2 >>> >>> Also, here is my ha.cf. It uses both the direct link between the nodes >>> (br1) and the shared LAN network on br0 for communicating: >>> autojoin none >>> mcast br0 239.0.0.43 694 1 0 >>> bcast br1 >>> warntime 5 >>> deadtime 15 >>> initdead 60 >>> keepalive 2 >>> node node1 >>> node node2 >>> node quorumnode >>> crm respawn >>> respawn hacluster /usr/lib/heartbeat/dopd >>> apiauth dopd gid=haclient uid=hacluster >>> >>> I am thinking of making the following changes to the CIB (as per the >>> official DRBD >>> guide >> > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in >>> order to add the DRBD lsb service and require that it start before the >>> ocf:linbit:drbd resources. Does this look correct? >> >> Where did you read that? No, deactivate the startup of DRBD on system >> boot and let Pacemaker manage it completely. >> >>> primitive p_drbd-init lsb:drbd op monitor interval="30" >>> colocation c_drbd_together inf: >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master >>> ms_drbd_mount2:Master >>> order drbd_init_first inf: ms_drbd_vmstore:promote >>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start >>> >>> This doesn't seem to require that drbd be also running on the node where >>> the ocf:linbit:drbd resources are slave (which it would need to do to be >>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? >>> (clone cl_drbd p_drbd-init ?) >> >> This is really not needed. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> >>> Thanks, >>> >>> Andrew >>> ------------------------------------------------------------------------ >>> *From: *"Andreas Kurz" <andr...@hastexo.com <mailto:andr...@hastexo.com>> >>> *To: *pacemaker@oss.clusterlabs.org > <mailto:*pacemaker@oss.clusterlabs.org> >>> *Sent: *Monday, March 26, 2012 5:56:22 PM >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to >>> master on failover >>> >>> On 03/24/2012 08:15 PM, Andrew Martin wrote: >>>> Hi Andreas, >>>> >>>> My complete cluster configuration is as follows: >>>> ============ >>>> Last updated: Sat Mar 24 13:51:55 2012 >>>> Last change: Sat Mar 24 13:41:55 2012 >>>> Stack: Heartbeat >>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition >>>> with quorum >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c >>>> 3 Nodes configured, unknown expected votes >>>> 19 Resources configured. >>>> ============ >>>> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE > (standby) >>>> Online: [ node2 node1 ] >>>> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] >>>> Masters: [ node2 ] >>>> Slaves: [ node1 ] >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] >>>> Masters: [ node2 ] >>>> Slaves: [ node1 ] >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] >>>> Masters: [ node2 ] >>>> Slaves: [ node1 ] >>>> Resource Group: g_vm >>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2 >>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2 >>>> Clone Set: cl_daemons [g_daemons] >>>> Started: [ node2 node1 ] >>>> Stopped: [ g_daemons:2 ] >>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify] >>>> Started: [ node2 node1 ] >>>> Stopped: [ p_sysadmin_notify:2 ] >>>> stonith-node1(stonith:external/tripplitepdu):Started node2 >>>> stonith-node2(stonith:external/tripplitepdu):Started node1 >>>> Clone Set: cl_ping [p_ping] >>>> Started: [ node2 node1 ] >>>> Stopped: [ p_ping:2 ] >>>> >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \ >>>> attributes standby="off" >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \ >>>> attributes standby="off" >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \ >>>> attributes standby="on" >>>> primitive p_drbd_mount2 ocf:linbit:drbd \ >>>> params drbd_resource="mount2" \ >>>> op monitor interval="15" role="Master" \ >>>> op monitor interval="30" role="Slave" >>>> primitive p_drbd_mount1 ocf:linbit:drbd \ >>>> params drbd_resource="mount1" \ >>>> op monitor interval="15" role="Master" \ >>>> op monitor interval="30" role="Slave" >>>> primitive p_drbd_vmstore ocf:linbit:drbd \ >>>> params drbd_resource="vmstore" \ >>>> op monitor interval="15" role="Master" \ >>>> op monitor interval="30" role="Slave" >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ >>>> params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \ >>>> op start interval="0" timeout="60s" \ >>>> op stop interval="0" timeout="60s" \ >>>> op monitor interval="20s" timeout="40s" >>>> primitive p_libvirt-bin upstart:libvirt-bin \ >>>> op monitor interval="30" >>>> primitive p_ping ocf:pacemaker:ping \ >>>> params name="p_ping" host_list="192.168.1.10 192.168.1.11" >>>> multiplier="1000" \ >>>> op monitor interval="20s" >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ >>>> params email="m...@example.com <mailto:m...@example.com>" \ >>>> params subject="Pacemaker Change" \ >>>> op start interval="0" timeout="10" \ >>>> op stop interval="0" timeout="10" \ >>>> op monitor interval="10" timeout="10" >>>> primitive p_vm ocf:heartbeat:VirtualDomain \ >>>> params config="/vmstore/config/vm.xml" \ >>>> meta allow-migrate="false" \ >>>> op start interval="0" timeout="120s" \ >>>> op stop interval="0" timeout="120s" \ >>>> op monitor interval="10" timeout="30" >>>> primitive stonith-node1 stonith:external/tripplitepdu \ >>>> params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx" >>>> pdu_password="xxx" hostname_to_stonith="node1" >>>> primitive stonith-node2 stonith:external/tripplitepdu \ >>>> params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx" >>>> pdu_password="xxx" hostname_to_stonith="node2" >>>> group g_daemons p_libvirt-bin >>>> group g_vm p_fs_vmstore p_vm >>>> ms ms_drbd_mount2 p_drbd_mount2 \ >>>> meta master-max="1" master-node-max="1" clone-max="2" >>>> clone-node-max="1" notify="true" >>>> ms ms_drbd_mount1 p_drbd_mount1 \ >>>> meta master-max="1" master-node-max="1" clone-max="2" >>>> clone-node-max="1" notify="true" >>>> ms ms_drbd_vmstore p_drbd_vmstore \ >>>> meta master-max="1" master-node-max="1" clone-max="2" >>>> clone-node-max="1" notify="true" >>>> clone cl_daemons g_daemons >>>> clone cl_ping p_ping \ >>>> meta interleave="true" >>>> clone cl_sysadmin_notify p_sysadmin_notify >>>> location l-st-node1 stonith-node1 -inf: node1 >>>> location l-st-node2 stonith-node2 -inf: node2 >>>> location l_run_on_most_connected p_vm \ >>>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm >>> >>> As Emmanuel already said, g_vm has to be in the first place in this >>> collocation constraint .... g_vm must be colocated with the drbd masters. >>> >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start >>>> property $id="cib-bootstrap-options" \ >>>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ >>>> cluster-infrastructure="Heartbeat" \ >>>> stonith-enabled="false" \ >>>> no-quorum-policy="stop" \ >>>> last-lrm-refresh="1332539900" \ >>>> cluster-recheck-interval="5m" \ >>>> crmd-integration-timeout="3m" \ >>>> shutdown-escalation="5m" >>>> >>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite >>>> PDUMH20ATNET that I'm using as the STONITH device: >>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf >>> >>> And why don't using it? .... stonith-enabled="false" >>> >>>> >>>> As you can see, I left the DRBD service to be started by the operating >>>> system (as an lsb script at boot time) however Pacemaker controls >>>> actually bringing up/taking down the individual DRBD devices. >>> >>> Don't start drbd on system boot, give Pacemaker the full control. >>> >>> The >>>> behavior I observe is as follows: I issue "crm resource migrate p_vm" on >>>> node1 and failover successfully to node2. During this time, node2 fences >>>> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile >>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then >>>> bring them back up. They reconnect to the cluster (with quorum), and >>>> node1's DRBD devices are still Outdated as expected and node2's DRBD >>>> devices are still UpToDate, as expected. At this point, DRBD starts on >>>> both nodes, however node2 will not set DRBD as master: >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE > (standby) >>>> Online: [ node2 node1 ] >>>> >>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] >>>> Slaves: [ node1 node2 ] >>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] >>>> Slaves: [ node1 node 2 ] >>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] >>>> Slaves: [ node1 node2 ] >>> >>> There should really be no interruption of the drbd replication on vm >>> migration that activates the dopd ... drbd has its own direct network >>> connection? >>> >>> Please share your ha.cf file and your drbd configuration. Watch out for >>> drbd messages in your kernel log file, that should give you additional >>> information when/why the drbd connection was lost. >>> >>> Regards, >>> Andreas >>> >>> -- >>> Need help with Pacemaker? >>> http://www.hastexo.com/now >>> >>>> >>>> I am having trouble sorting through the logging information because >>>> there is so much of it in /var/log/daemon.log, but I can't find an >>>> error message printed about why it will not promote node2. At this point >>>> the DRBD devices are as follows: >>>> node2: cstate = WFConnection dstate=UpToDate >>>> node1: cstate = StandAlone dstate=Outdated >>>> >>>> I don't see any reason why node2 can't become DRBD master, or am I >>>> missing something? If I do "drbdadm connect all" on node1, then the >>>> cstate on both nodes changes to "Connected" and node2 immediately >>>> promotes the DRBD resources to master. Any ideas on why I'm observing >>>> this incorrect behavior? >>>> >>>> Any tips on how I can better filter through the pacemaker/heartbeat logs >>>> or how to get additional useful debug information? >>>> >>>> Thanks, >>>> >>>> Andrew >>>> >>>> ------------------------------------------------------------------------ >>>> *From: *"Andreas Kurz" <andr...@hastexo.com > <mailto:andr...@hastexo.com>> >>>> *To: *pacemaker@oss.clusterlabs.org >> <mailto:*pacemaker@oss.clusterlabs.org> >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to >>>> master on failover >>>> >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote: >>>>> Hello, >>>>> >>>>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6 >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster > includes >>>>> the following resources: >>>>> - primitives for DRBD storage devices >>>>> - primitives for mounting the filesystem on the DRBD storage >>>>> - primitives for some mount binds >>>>> - primitive for starting apache >>>>> - primitives for starting samba and nfs servers (following instructions >>>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>) >>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs) >>>> >>>> not enough information ... please share at least your complete cluster >>>> configuration >>>> >>>> Regards, >>>> Andreas >>>> >>>> -- >>>> Need help with Pacemaker? >>>> http://www.hastexo.com/now >>>> >>>>> >>>>> Perhaps this is best described through the output of crm_mon: >>>>> Online: [ node1 node2 ] >>>>> >>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged) >>>>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2 >>> (unmanaged) >>>>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1 >>>>> (unmanaged) FAILED >>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] >>>>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1 >>>>> (unmanaged) FAILED >>>>> Slaves: [ node2 ] >>>>> Resource Group: g_core >>>>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 >>>>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1 >>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 >>>>> Resource Group: g_apache >>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1 >>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1 >>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1 >>>>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 >>>>> p_apache (ocf::heartbeat:apache): Started node1 >>>>> Resource Group: g_fileservers >>>>> p_lsb_smb (lsb:smbd): Started node1 >>>>> p_lsb_nmb (lsb:nmbd): Started node1 >>>>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1 >>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1 >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started > node1 >>>>> >>>>> I have read through the Pacemaker Explained >>>>> >>>> >>> > <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained> >>>>> documentation, however could not find a way to further debug these >>>>> problems. First, I put node1 into standby mode to attempt failover to >>>>> the other node (node2). Node2 appeared to start the transition to >>>>> master, however it failed to promote the DRBD resources to master (the >>>>> first step). I have attached a copy of this session in commands.log and >>>>> additional excerpts from /var/log/syslog during important steps. I have >>>>> attempted everything I can think of to try and start the DRBD resource >>>>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting >>>>> heartbeat) but cannot bring it out of the slave state. However, if > I set >>>>> it to unmanaged and then run drbdadm primary all in the terminal, >>>>> pacemaker is satisfied and continues starting the rest of the > resources. >>>>> It then failed when attempting to mount the filesystem for mount2, the >>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself > and was >>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then >>>>> it mounted. The rest of the resources started as expected until the >>>>> p_exportfs_mount2 resource, which failed as follows: >>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2 >>>>> (unmanaged) FAILED >>>>> >>>>> I ran cleanup on this and it started, however when running this test >>>>> earlier today no command could successfully start this exportfs >> resource. >>>>> >>>>> How can I configure pacemaker to better resolve these problems and be >>>>> able to bring the node up successfully on its own? What can I check to >>>>> determine why these failures are occuring? /var/log/syslog did not seem >>>>> to contain very much useful information regarding why the failures >>>> occurred. >>>>> >>>>> Thanks, >>>>> >>>>> Andrew >>>>> >>>>> >>>>> >>>>> >>>>> This body part will be downloaded on demand. >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> <mailto:Pacemaker@oss.clusterlabs.org> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org