On 25 Apr 2014, at 5:39 am, Brian J. Murrell <br...@interlinx.bc.ca> wrote:
> Hi, > > As was previously discussed there is a bug in the handling of a STONITH > if a node reboots too quickly. I had a different kind of failure that I > suspect is the same kind of problem, just different symptom. > > The situation is a two node cluster with two resources plus a fencing > resource. Each node is running one of the two resources and one is > running the fencing resource. One of the nodes is killed. The > remaining node should notice this, STONITH it and take over the > resource, but what happens is all of the above except the resource is > restarted on the node that was killed. > > Here's the log from the surviving node -- the one that issues the > STONITH and *should* take over the resource: > > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 20: memb=1, new=0, lost=1 > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > memb: node2 2085752330 > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > lost: node1 2068975114 > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 20: memb=1, new=0, lost=0 > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node2 2085752330 > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node node1 was not seen in the previous transition > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: update_member: Node > 2068975114/node1 is now: lost > Apr 22 19:47:26 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 20 to 2 children > Apr 22 19:47:26 node2 corosync[3204]: [TOTEM ] A processor joined or left > the membership and a new membership was formed. > > node1 is gone > > Apr 22 19:47:26 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) > ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:2 left:1) > Apr 22 19:47:26 node2 corosync[3204]: [MAIN ] Completed service > synchronization, ready to provide service. > Apr 22 19:47:26 node2 crmd[3255]: notice: plugin_handle_membership: > Membership 20: quorum lost > Apr 22 19:47:26 node2 crmd[3255]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now lost (was > member) > Apr 22 19:47:26 node2 crmd[3255]: warning: match_down_event: No match for > shutdown action on node1 > Apr 22 19:47:26 node2 crmd[3255]: notice: peer_update_callback: > Stonith/shutdown of node1 not matched > > Is this "Stonith/shutdown of node1 not matched" because the fencing > resource was running on node1 but now it's gone? No thats pacemaker's obtuse way of saying "node1 went away when we weren't expecting it to" > > Apr 22 19:47:26 node2 cib[3250]: notice: plugin_handle_membership: > Membership 20: quorum lost > Apr 22 19:47:26 node2 cib[3250]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now lost (was > member) > Apr 22 19:47:26 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL > origin=check_join_state ] > Apr 22 19:47:26 node2 attrd[3253]: notice: attrd_local_callback: Sending > full refresh (origin=crmd) > Apr 22 19:47:26 node2 attrd[3253]: notice: attrd_trigger_update: Sending > flush op to all hosts for: probe_complete (true) > Apr 22 19:47:26 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:47:26 node2 pengine[3254]: warning: pe_fence_node: Node node1 will > be fenced because the node is no longer part of the cluster > Apr 22 19:47:26 node2 pengine[3254]: warning: determine_online_status: Node > node1 is unclean > Apr 22 19:47:26 node2 pengine[3254]: warning: custom_action: Action > st-fencing_stop_0 on node1 is unrunnable (offline) > Apr 22 19:47:26 node2 pengine[3254]: warning: custom_action: Action > resource1_stop_0 on node1 is unrunnable (offline) > > Resources from node1 unrunnable because it's AWOL! Right, probably shouldn't be a warning. > > Apr 22 19:47:26 node2 pengine[3254]: warning: stage6: Scheduling Node node1 > for STONITH > > So kill it! > > Apr 22 19:47:26 node2 pengine[3254]: notice: LogActions: Move > st-fencing#011(Started node1 -> node2) > Apr 22 19:47:26 node2 pengine[3254]: notice: LogActions: Move > resource1#011(Started node1 -> node2) > > So here are the resources that were on node1 being scheduled to move to > node2. > > Apr 22 19:47:26 node2 crmd[3255]: notice: te_fence_node: Executing reboot > fencing operation (13) on node1 (timeout=60000) > Apr 22 19:47:26 node2 stonith-ng[3251]: notice: handle_request: Client > crmd.3255.0a3d203a wants to fence (reboot) 'node1' with device '(any)' > Apr 22 19:47:26 node2 stonith-ng[3251]: notice: initiate_remote_stonith_op: > Initiating remote operation reboot for node1: > 7660cf0e-10da-4e96-b3bd-4b415c72201c (0) > > The actual killing of node1. > > Apr 22 19:47:26 node2 pengine[3254]: warning: process_pe_message: Calculated > Transition 23: /var/lib/pacemaker/pengine/pe-warn-0.bz2 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 24: memb=1, new=0, lost=0 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > memb: node2 2085752330 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 24: memb=2, new=1, lost=0 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: update_member: Node > 2068975114/node1 is now: member Ah! You're running the custom plugin that I don't test anymore. Please don't do that > > And this is where node1 comes back. > > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: NEW: > node1 2068975114 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node1 2068975114 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node2 2085752330 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 24 to 2 children > Apr 22 19:48:22 node2 corosync[3204]: [TOTEM ] A processor joined or left > the membership and a new membership was formed. > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: update_member: > 0x170c920 Node 2068975114 (node1) born on: 24 > Apr 22 19:48:22 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 24 to 2 children > Apr 22 19:48:22 node2 cib[3250]: notice: plugin_handle_membership: > Membership 24: quorum acquired > Apr 22 19:48:22 node2 cib[3250]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now member (was > lost) > > And the CIB sees it. > > Apr 22 19:48:22 node2 crmd[3255]: notice: plugin_handle_membership: > Membership 24: quorum acquired > Apr 22 19:48:22 node2 crmd[3255]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now member (was > lost) > Apr 22 19:48:22 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) > ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:1 left:0) > Apr 22 19:48:22 node2 corosync[3204]: [MAIN ] Completed service > synchronization, ready to provide service. > Apr 22 19:48:43 node2 corosync[3204]: [TOTEM ] A processor failed, forming > new configuration. > > And it goes AWOL again? It shouldn't have. Thats corosync failing, nothing to do with pacemaker > > Apr 22 19:48:44 node2 stonith-ng[3251]: notice: log_operation: Operation > 'reboot' [4110] (call 2 from crmd.3255) for host 'node1' with device > 'st-fencing' returned: 0 (OK) > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 28: memb=1, new=0, lost=1 > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > memb: node2 2085752330 > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > lost: node1 2068975114 > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 28: memb=1, new=0, lost=0 > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node2 2085752330 > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node node1 was not seen in the previous transition > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: update_member: Node > 2068975114/node1 is now: lost > Apr 22 19:48:49 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 28 to 2 children > Apr 22 19:48:49 node2 corosync[3204]: [TOTEM ] A processor joined or left > the membership and a new membership was formed. > Apr 22 19:48:49 node2 crmd[3255]: notice: plugin_handle_membership: > Membership 28: quorum lost > Apr 22 19:48:49 node2 crmd[3255]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now lost (was > member) > Apr 22 19:48:49 node2 cib[3250]: notice: plugin_handle_membership: > Membership 28: quorum lost > Apr 22 19:48:49 node2 cib[3250]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now lost (was > member) > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 32: memb=1, new=0, lost=0 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > memb: node2 2085752330 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 32: memb=2, new=1, lost=0 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: update_member: Node > 2068975114/node1 is now: member > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: NEW: > node1 2068975114 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node1 2068975114 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: pcmk_peer_update: > MEMB: node2 2085752330 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 32 to 2 children > Apr 22 19:49:26 node2 corosync[3204]: [TOTEM ] A processor joined or left > the membership and a new membership was formed. > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: update_member: > 0x170c920 Node 2068975114 (node1) born on: 32 > Apr 22 19:49:26 node2 corosync[3204]: [pcmk ] info: > send_member_notification: Sending membership update 32 to 2 children > Apr 22 19:49:26 node2 crmd[3255]: notice: plugin_handle_membership: > Membership 32: quorum acquired > Apr 22 19:49:26 node2 crmd[3255]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now member (was > lost) > Apr 22 19:49:26 node2 cib[3250]: notice: plugin_handle_membership: > Membership 32: quorum acquired > Apr 22 19:49:26 node2 cib[3250]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[2068975114] - state is now member (was > lost) > > And it's back again? > > Apr 22 19:49:26 node2 corosync[3204]: [CPG ] chosen downlist: sender r(0) > ip(10.14.82.124) r(1) ip(10.128.2.124) ; members(old:2 left:1) > Apr 22 19:49:26 node2 corosync[3204]: [MAIN ] Completed service > synchronization, ready to provide service. > Apr 22 19:49:26 node2 stonith-ng[3251]: notice: remote_op_done: Operation > reboot of node1 by node2 for crmd.3255@node2.7660cf0e: OK > Apr 22 19:49:26 node2 crmd[3255]: notice: tengine_stonith_callback: Stonith > operation 2/13:23:0:e63ab1bb-b839-48d5-9a4d-b27677d3787f: OK (0) > Apr 22 19:49:26 node2 crmd[3255]: notice: tengine_stonith_notify: Peer > node1 was terminated (reboot) by node2 for node2: OK > (ref=7660cf0e-10da-4e96-b3bd-4b415c72201c) by client crmd.3255 > Apr 22 19:49:26 node2 crmd[3255]: notice: run_graph: Transition 23 > (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-warn-0.bz2): Stopped > Apr 22 19:49:33 node2 attrd[3253]: notice: attrd_local_callback: Sending > full refresh (origin=crmd) > Apr 22 19:49:33 node2 attrd[3253]: notice: attrd_trigger_update: Sending > flush op to all hosts for: probe_complete (true) > Apr 22 19:49:34 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:49:34 node2 pengine[3254]: notice: LogActions: Start > st-fencing#011(node1) > Apr 22 19:49:34 node2 pengine[3254]: notice: LogActions: Start > resource1#011(node1) > > In any case, here we can see that the resource1 and the fencing resource > were started up on node1 again. > > Why would these start back up on a node that has clearly failed and not > started on this node, node2? > > Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 5: monitor st-fencing_monitor_0 on node1 > Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 6: monitor resource1_monitor_0 on node1 > Apr 22 19:49:34 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 7: monitor resource2_monitor_0 on node1 > Apr 22 19:49:34 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 24: /var/lib/pacemaker/pengine/pe-input-95.bz2 > Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 4: probe_complete probe_complete on node1 - no waiting > Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 9: start st-fencing_start_0 on node1 > Apr 22 19:49:40 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 10: start resource1_start_0 on node1 > Apr 22 19:50:24 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 11: monitor resource1_monitor_5000 on node1 > > And here is the cluster configuration messages so that you can see how > resource1 and resource2 were added and what constraints were set on > them, etc.: > > Apr 22 19:45:46 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only > Change: 0.23.1 > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="22" num_updates="1"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <primitive > class="ocf" provider="foo" type="Target" id="resource1"> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ > <meta_attributes id="resource1-meta_attributes"> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <nvpair > name="target-role" id="resource1-meta_attributes-target-role" > value="Stopped"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ > </meta_attributes> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <operations > id="resource1-operations"> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource1-monitor-5" interval="5" name="monitor" timeout="60"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource1-start-0" interval="0" name="start" timeout="300"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource1-stop-0" interval="0" name="stop" timeout="300"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </operations> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ > <instance_attributes id="resource1-instance_attributes"> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <nvpair > id="resource1-instance_attributes-target" name="target" > value="bddf8d7c-5b5c-4c24-9e55-4704dd336a8b"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ > </instance_attributes> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ </primitive> > Apr 22 19:45:46 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 6: monitor resource1_monitor_0 on node2 (local) > Apr 22 19:45:46 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 15: /var/lib/pacemaker/pengine/pe-input-87.bz2 > Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 4: monitor resource1_monitor_0 on node1 > Apr 22 19:45:46 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only > Change: 0.24.1 > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="23" num_updates="1"/> > Apr 22 19:45:46 node2 cib[3250]: notice: cib:diff: ++ <rsc_location > id="resource1-primary" node="node1" rsc="resource1" score="20"/> > Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 3: probe_complete probe_complete on node1 - no waiting > Apr 22 19:45:46 node2 crmd[3255]: notice: process_lrm_event: LRM operation > resource1_monitor_0 (call=10, rc=7, cib-update=58, confirmed=true) not running > Apr 22 19:45:46 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 5: probe_complete probe_complete on node2 (local) - no waiting > Apr 22 19:45:46 node2 crmd[3255]: notice: run_graph: Transition 15 > (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Stopped > Apr 22 19:45:46 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:46 node2 crmd[3255]: notice: run_graph: Transition 16 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-88.bz2): Complete > Apr 22 19:45:46 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:46 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 16: /var/lib/pacemaker/pengine/pe-input-88.bz2 > Apr 22 19:45:47 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:47 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:47 node2 cibadmin[3520]: notice: crm_log_args: Invoked: > cibadmin -o constraints -C -X <rsc_location id="resource1-secondary" > node="node2" rsc="resource1" score="10"/> > Apr 22 19:45:47 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: Diff: --- 0.24.3 > Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.25.1 > 1e5f079657dde6f6085a149c0cc5398b > Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="24" num_updates="3"/> > Apr 22 19:45:47 node2 cib[3250]: notice: cib:diff: ++ <rsc_location > id="resource1-secondary" node="node2" rsc="resource1" score="10"/> > Apr 22 19:45:47 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:47 node2 crmd[3255]: notice: run_graph: Transition 17 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-89.bz2): Complete > Apr 22 19:45:47 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:47 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 17: /var/lib/pacemaker/pengine/pe-input-89.bz2 > Apr 22 19:45:48 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:48 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:48 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Apr 22 19:45:48 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only > Change: 0.26.1 > Apr 22 19:45:48 node2 cib[3250]: notice: cib:diff: -- <nvpair > value="Stopped" id="resource1-meta_attributes-target-role"/> > Apr 22 19:45:48 node2 cib[3250]: notice: cib:diff: ++ <nvpair > name="target-role" id="resource1-meta_attributes-target-role" > value="Started"/> > Apr 22 19:45:48 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:48 node2 pengine[3254]: notice: LogActions: Start > resource1#011(node1) > Apr 22 19:45:48 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 7: start resource1_start_0 on node1 > Apr 22 19:45:48 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 18: /var/lib/pacemaker/pengine/pe-input-90.bz2 > Apr 22 19:45:49 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 8: monitor resource1_monitor_5000 on node1 > Apr 22 19:45:49 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:49 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:49 node2 crmd[3255]: notice: run_graph: Transition 18 > (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-90.bz2): Complete > Apr 22 19:45:49 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:50 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:53 node2 cibadmin[3597]: notice: crm_log_args: Invoked: > cibadmin -o resources -C -x /tmp/tmpoSwAoH > Apr 22 19:45:53 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Apr 22 19:45:53 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: Diff: --- 0.26.3 > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.27.1 > c0a07b74061c440082677bacfd9eff2a > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="26" num_updates="3"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <primitive > class="ocf" provider="foo" type="Target" id="resource2"> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ > <meta_attributes id="resource2-meta_attributes"> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <nvpair > name="target-role" id="resource2-meta_attributes-target-role" > value="Stopped"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ > </meta_attributes> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <operations > id="resource2-operations"> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource2-monitor-5" interval="5" name="monitor" timeout="60"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource2-start-0" interval="0" name="start" timeout="300"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <op > id="resource2-stop-0" interval="0" name="stop" timeout="300"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </operations> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ > <instance_attributes id="resource2-instance_attributes"> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <nvpair > id="resource2-instance_attributes-target" name="target" > value="38716497-b4a2-4ad7-8307-d9829857c7fc"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ > </instance_attributes> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ </primitive> > Apr 22 19:45:53 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:53 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 19: /var/lib/pacemaker/pengine/pe-input-91.bz2 > Apr 22 19:45:53 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 7: monitor resource2_monitor_0 on node2 (local) > Apr 22 19:45:53 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 5: monitor resource2_monitor_0 on node1 > Apr 22 19:45:53 node2 cibadmin[3601]: notice: crm_log_args: Invoked: > cibadmin -o constraints -C -X <rsc_location id="resource2-primary" > node="node2" rsc="resource2" score="20"/> > Apr 22 19:45:53 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only > Change: 0.28.1 > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="27" num_updates="1"/> > Apr 22 19:45:53 node2 cib[3250]: notice: cib:diff: ++ <rsc_location > id="resource2-primary" node="node2" rsc="resource2" score="20"/> > Apr 22 19:45:54 node2 crmd[3255]: notice: process_lrm_event: LRM operation > resource2_monitor_0 (call=15, rc=7, cib-update=63, confirmed=true) not running > Apr 22 19:45:54 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 6: probe_complete probe_complete on node2 (local) - no waiting > Apr 22 19:45:54 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 4: probe_complete probe_complete on node1 - no waiting > Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 19 > (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-91.bz2): Stopped > Apr 22 19:45:54 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:54 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 20: /var/lib/pacemaker/pengine/pe-input-92.bz2 > Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 20 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-92.bz2): Complete > Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: Diff: --- 0.28.3 > Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: Diff: +++ 0.29.1 > a5aa96d3ac5932d62dd542c74956adaa > Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: -- <cib admin_epoch="0" > epoch="28" num_updates="3"/> > Apr 22 19:45:54 node2 cib[3250]: notice: cib:diff: ++ <rsc_location > id="resource2-secondary" node="node1" rsc="resource2" score="10"/> > Apr 22 19:45:54 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:54 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 21: /var/lib/pacemaker/pengine/pe-input-93.bz2 > Apr 22 19:45:54 node2 crmd[3255]: notice: run_graph: Transition 21 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-93.bz2): Complete > Apr 22 19:45:54 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:54 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:54 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:55 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:55 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:55 node2 crmd[3255]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Apr 22 19:45:55 node2 cib[3250]: notice: log_cib_diff: cib:diff: Local-only > Change: 0.30.1 > Apr 22 19:45:55 node2 cib[3250]: notice: cib:diff: -- <nvpair > value="Stopped" id="resource2-meta_attributes-target-role"/> > Apr 22 19:45:55 node2 cib[3250]: notice: cib:diff: ++ <nvpair > name="target-role" id="resource2-meta_attributes-target-role" > value="Started"/> > Apr 22 19:45:55 node2 pengine[3254]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Apr 22 19:45:55 node2 pengine[3254]: notice: LogActions: Start > resource2#011(node2) > Apr 22 19:45:55 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 10: start resource2_start_0 on node2 (local) > Apr 22 19:45:55 node2 pengine[3254]: notice: process_pe_message: Calculated > Transition 22: /var/lib/pacemaker/pengine/pe-input-94.bz2 > Apr 22 19:45:56 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > Apr 22 19:45:56 node2 stonith-ng[3251]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Apr 22 19:45:56 node2 crmd[3255]: notice: process_lrm_event: LRM operation > resource2_start_0 (call=18, rc=0, cib-update=67, confirmed=true) ok > Apr 22 19:45:56 node2 crmd[3255]: notice: te_rsc_command: Initiating action > 11: monitor resource2_monitor_5000 on node2 (local) > Apr 22 19:45:57 node2 crmd[3255]: notice: process_lrm_event: LRM operation > resource2_monitor_5000 (call=21, rc=0, cib-update=68, confirmed=false) ok > Apr 22 19:45:57 node2 crmd[3255]: notice: run_graph: Transition 22 > (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-94.bz2): Complete > Apr 22 19:45:57 node2 crmd[3255]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 22 19:45:57 node2 stonith-ng[3251]: notice: stonith_device_register: > Added 'st-fencing' to the device list (1 active devices) > > > Cheers, > b. > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org