Hi, I'm testing pacemaker resource failover in a very simple test environment with two virtual machines. 3 Cloned resources (drbd dualprimary), controld, clvm. Fencing with external/ssh that's it. I'm having problems understanding why my clvm resource gets restarted when a failing node gets back online.
When one node is powerd off (failtest) the remaining node fences the "failing" node and the clvm-resource stays online. But when the failed node is back online the clvm resource clone on the previously "remaining " node gets restarted without visible reason (see logs) I gues doing something wrong! But what? Anyone who can point me in the right direction? Thank you! Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke: Query 228: Requesting the current CIB: S_POLICY_ENGINE Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_config: On loss of CCM Quorum: Ignore Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on tnode1 Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke_callback: Invoking the PE: query=228, ref=pe_calc-dc-1316517521-176, seq=1268, quorate=1 Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation res_drbd_1:0_monitor_0 found resource res_drbd_1:0 active on tnode2 Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Master/Slave Set: ms_drbd_1 [res_drbd_1] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Masters: [ tnode2 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Slaves: [ tnode1 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set: cl_controld_1 [res_controld_dlm] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [ tnode2 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [ res_controld_dlm:1 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_1#011(stonith:external/ssh):#011Started tnode1 Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print: stonith_external_ssh_2#011(stonith:external/ssh):#011Started tnode2 Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set: cl_clvmd_1 [res_clvmd_clustervg] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [ tnode2 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [ res_clvmd_clustervg:1 ] Sep 20 13:18:41 tnode2 pengine: [3116]: notice: RecurringOp: Start recurring monitor (60s) for res_controld_dlm:1 on tnode1 Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave res_drbd_1:0#011(Master tnode2) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Promote res_drbd_1:1#011(Slave -> Master tnode1) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave res_controld_dlm:0#011(Started tnode2) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start res_controld_dlm:1#011(tnode1) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave stonith_external_ssh_1#011(Started tnode1) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave stonith_external_ssh_2#011(Started tnode2) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Restart res_clvmd_clustervg:0#011(Started tnode2) Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start res_clvmd_clustervg:1#011(tnode1) CONFIG node tnode1 \ attributes standby="off" node tnode2 \ attributes standby="off" primitive res_clvmd_clustervg ocf:lvm2:clvmd \ params daemon_timeout="30" \ operations $id="res_clvmd_clustervg-operations" \ op monitor interval="0" timeout="4min" start-delay="5" primitive res_controld_dlm ocf:pacemaker:controld \ operations $id="res_controld_dlm-operations" \ op monitor interval="60" timeout="60" start-delay="0" \ meta target-role="started" primitive res_drbd_1 ocf:linbit:drbd \ params drbd_resource="r0" \ operations $id="res_drbd_1-operations" \ op start interval="0" timeout="240" \ op promote interval="0" timeout="90" \ op demote interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" start-delay="1min" \ op notify interval="0" timeout="90" \ meta target-role="started" is-managed="true" primitive stonith_external_ssh_1 stonith:external/ssh \ params hostlist="tnode2" \ operations $id="stonith_external_ssh_1-operations" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="60" timeout="60" start-delay="0" \ meta failure-timeout="3" primitive stonith_external_ssh_2 stonith:external/ssh \ params hostlist="tnode1" \ operations $id="stonith_external_ssh_2-operations" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="60" timeout="60" start-delay="0" \ meta target-role="started" failure-timeout="3" ms ms_drbd_1 res_drbd_1 \ meta master-max="2" clone-max="2" notify="true" ordered="true" interleave="true" clone cl_clvmd_1 res_clvmd_clustervg \ meta clone-max="2" notify="true" clone cl_controld_1 res_controld_dlm \ meta clone-max="2" notify="true" ordered="true" interleave="true" location loc_ms_drbd_1-ping-prefer ms_drbd_1 \ rule $id="loc_ms_drbd_1-ping-prefer-rule" pingd: defined pingd location loc_stonith_external_ssh_1_tnode2 stonith_external_ssh_1 -inf: tnode2 location loc_stonith_external_ssh_2_tnode1 stonith_external_ssh_2 -inf: tnode1 colocation col_cl_controld_1_cl_clvmd_1 inf: cl_clvmd_1 cl_controld_1 colocation col_ms_drbd_1_cl_controld_1 inf: cl_controld_1 ms_drbd_1:Master order ord_cl_controld_1_cl_clvmd_1 inf: cl_controld_1 cl_clvmd_1 order ord_ms_drbd_1_cl_controld_1 inf: ms_drbd_1:promote cl_controld_1:start property $id="cib-bootstrap-options" \ expected-quorum-votes="2" \ stonith-timeout="30" \ dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ no-quorum-policy="ignore" \ cluster-infrastructure="openais" \
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker