Hello, I'm attempting to set up a simple NFS failover test using Pacemaker and DRBD on 2 nodes. The goal is to have one host be the DRBD master, and have the volume mounted, the NFS server running, and a virtual IP address up. The other node is the DRBD slave with no NFS services or virtual IP running. The DRBD resource is configured in master-slave (not dual-master) mode and seems to work fine when it's not being controlled by Pacemaker.
The problem is that both nodes start out as DRBD slaves, and neither node gets promoted: # crm_mon -1 ============ Last updated: Mon Sep 23 14:39:12 2013 Last change: Mon Sep 23 14:26:15 2013 via cibadmin on test-vm-1 Stack: openais Current DC: test-vm-1 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 5 Resources configured. ============ Online: [ test-vm-1 test-vm-2 ] Master/Slave Set: ms-drbd_r0 [drbd_r0] Slaves: [ test-vm-1 test-vm-2 ] If I try to force a promotion with "crm resource promote ms-drbd_r0" I get no output, and I see this line in the log: cib: [27320]: info: cib_process_request: Operation complete: op cib_modify for section resources (origin=local/crm_resource/4, version=0.65.43): ok (rc=0) However, "crm_mon -1" still shows that both nodes are slaves. I have a constraint such that the NFS resources will only run on the DRBD master, and a node will only get promoted to master once the virtual IP is started on it. I suspect that the IP is not starting and that's holding up the promotion, but I can't figure out why the IP wouldn't start. Looking in the log, I see a bunch of pending actions to start the IP, but they're not actually firing: # grep 'nfs_ip' /var/log/cluster/corosync.log Sep 23 14:28:24 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1 - blocked) Sep 23 14:28:24 test-vm-1 crmd: [27325]: info: te_rsc_command: Initiating action 6: monitor nfs_ip_monitor_0 on test-vm-1 (local) Sep 23 14:28:24 test-vm-1 lrmd: [27322]: info: rsc:nfs_ip probe[4] (pid 27398) Sep 23 14:28:25 test-vm-1 lrmd: [27322]: info: operation monitor[4] on nfs_ip for client 27325: pid 27398 exited with return code 7 Sep 23 14:28:25 test-vm-1 crmd: [27325]: info: process_lrm_event: LRM operation nfs_ip_monitor_0 (call=4, rc=7, cib-update=28, confirmed=true) not running Sep 23 14:28:27 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1) Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 8]: Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0) Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:27 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 7]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:33 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1) Sep 23 14:28:33 test-vm-1 crmd: [27325]: info: te_rsc_command: Initiating action 7: monitor nfs_ip_monitor_0 on test-vm-2 Sep 23 14:28:36 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1) Sep 23 14:28:37 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1) Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 9]: Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0) Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:28:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:43:37 test-vm-1 pengine: [27324]: notice: LogActions: Start nfs_ip (test-vm-1) Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 9]: Pending (id: nfs_ip_monitor_10000, loc: test-vm-1, priority: 0) Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem: [Action 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Sep 23 14:43:37 test-vm-1 crmd: [27325]: WARN: print_elem: * [Input 8]: Pending (id: nfs_ip_start_0, loc: test-vm-1, priority: 0) Any help will be greatly appreciated. The relevant portion of my CIB is below: <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-maintenance-mode" name="maintenance-mode" value="false"/> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/> </cluster_property_set> </crm_config> <nodes> <node id="test-vm-1" type="normal" uname="test-vm-1"/> <node id="test-vm-2" type="normal" uname="test-vm-2"/> </nodes> <resources> <group id="nfs_resources"> <meta_attributes id="nfs_resources-meta_attributes"> <nvpair id="nfs_resources-meta_attributes-target-role" name="target-role" value="Started"/> </meta_attributes> <primitive class="ocf" id="nfs_fs" provider="heartbeat" type="Filesystem"> <instance_attributes id="nfs_fs-instance_attributes"> <nvpair id="nfs_fs-instance_attributes-device" name="device" value="/dev/drbd1"/> <nvpair id="nfs_fs-instance_attributes-directory" name="directory" value="/mnt/data/"/> <nvpair id="nfs_fs-instance_attributes-fstype" name="fstype" value="ext3"/> <nvpair id="nfs_fs-instance_attributes-options" name="options" value="noatime,nodiratime"/> </instance_attributes> <operations> <op id="nfs_fs-start-0" interval="0" name="start" timeout="60"/> <op id="nfs_fs-stop-0" interval="0" name="stop" timeout="120"/> </operations> </primitive> <primitive class="lsb" id="nfs" type="nfs-kernel-server"> <operations> <op id="nfs-monitor-5s" interval="5s" name="monitor"/> </operations> </primitive> <primitive class="ocf" id="nfs_ip" provider="heartbeat" type="IPaddr2"> <instance_attributes id="nfs_ip-instance_attributes"> <nvpair id="nfs_ip-instance_attributes-ip" name="ip" value="192.168.25.205"/> <nvpair id="nfs_ip-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/> </instance_attributes> <operations> <op id="nfs_ip-monitor-10s" interval="10s" name="monitor"/> </operations> <meta_attributes id="nfs_ip-meta_attributes"> <nvpair id="nfs_ip-meta_attributes-is-managed" name="is-managed" value="true"/> </meta_attributes> </primitive> </group> <master id="ms-drbd_r0"> <meta_attributes id="ms-drbd_r0-meta_attributes"> <nvpair id="ms-drbd_r0-meta_attributes-clone-max" name="clone-max" value="2"/> <nvpair id="ms-drbd_r0-meta_attributes-notify" name="notify" value="true"/> <nvpair id="ms-drbd_r0-meta_attributes-globally-unique" name="globally-unique" value="false"/> <nvpair id="ms-drbd_r0-meta_attributes-target-role" name="target-role" value="Master"/> </meta_attributes> <primitive class="ocf" id="drbd_r0" provider="heartbeat" type="drbd"> <instance_attributes id="drbd_r0-instance_attributes"> <nvpair id="drbd_r0-instance_attributes-drbd_resource" name="drbd_resource" value="r0"/> </instance_attributes> <operations> <op id="drbd_r0-monitor-59s" interval="59s" name="monitor" role="Master" timeout="30s"/> <op id="drbd_r0-monitor-60s" interval="60s" name="monitor" role="Slave" timeout="30s"/> </operations> </primitive> </master> </resources> <constraints> <rsc_colocation id="drbd-nfs-ha" rsc="ms-drbd_r0" rsc-role="Master" score="INFINITY" with-rsc="nfs_resources"/> <rsc_order first="nfs_ip" first-action="start" id="ip-before-drbd" score="INFINITY" then="ms-drbd_r0" then-action="promote"/> <rsc_order first="ms-drbd_r0" first-action="promote" id="drbd-before-nfs" score="INFINITY" then="nfs_fs" then-action="start"/> <rsc_order first="nfs_fs" first-action="start" id="fs-before-nfs" score="INFINITY" then="nfs" then-action="start"/> </constraints> <rsc_defaults> <meta_attributes id="rsc-options"> <nvpair id="rsc-options-resource-stickiness" name="resource-stickiness" value="100"/> </meta_attributes> </rsc_defaults> <op_defaults/> </configuration> -- Dave Parker Systems Administrator Utica College Integrated Information Technology Services (315) 792-3229 Registered Linux User #408177
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org