One of my nodes died in a 2 node cluster I gather something went wrong, and it fenced/killed itself. But I am not sure what happened.
I think maybe around that time the VM backups happened and snap of the VM could have happened But there is nothing for me to put my finger on Output from messages around that time This is on devrp1 Oct 8 23:31:38 devrp1 corosync[1670]: [TOTEM ] A processor failed, forming new configuration. Oct 8 23:31:40 devrp1 corosync[1670]: [CMAN ] quorum lost, blocking activity Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] Members[1]: 1 Oct 8 23:31:40 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:40 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:2 left:1) Oct 8 23:31:40 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:41 devrp1 kernel: dlm: closing connection to node 2 Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 424: quorum lost Oct 8 23:31:42 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:42 devrp1 corosync[1670]: [CMAN ] quorum regained, resuming activity Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] This node is within the primary component and will provide service. Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:1 left:0) Oct 8 23:31:42 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now lost (was member) Oct 8 23:31:42 devrp1 crmd[2350]: warning: reap_dead_nodes: Our DC node (devrp2) left the cluster Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 428: quorum acquired Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now member (was lost) Oct 8 23:31:42 devrp1 crmd[2350]: notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ] Oct 8 23:31:42 devrp1 corosync[1670]: cman killed by node 2 because we were killed by cman_tool or other application Oct 8 23:31:42 devrp1 pacemakerd[2339]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 stonith-ng[2346]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]: error: crmd_cs_destroy: connection terminated Oct 8 23:31:43 devrp1 fenced[1726]: cluster is down, exiting Oct 8 23:31:43 devrp1 fenced[1726]: daemon cpg_dispatch error 2 Oct 8 23:31:43 devrp1 attrd[2348]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:43 devrp1 attrd[2348]: crit: attrd_cs_destroy: Lost connection to Corosync service! Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Exiting... Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Disconnecting client 0x18cf240, pid=2350... Oct 8 23:31:43 devrp1 pacemakerd[2339]: error: mcp_cpg_destroy: Connection destroyed Oct 8 23:31:43 devrp1 cib[2345]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:43 devrp1 cib[2345]: error: cib_cs_destroy: Corosync connection lost! Exiting. Oct 8 23:31:43 devrp1 stonith-ng[2346]: error: stonith_peer_cs_destroy: Corosync connection terminated Oct 8 23:31:43 devrp1 dlm_controld[1752]: cluster is down, exiting Oct 8 23:31:43 devrp1 dlm_controld[1752]: daemon cpg_dispatch error 2 Oct 8 23:31:43 devrp1 gfs_controld[1801]: cluster is down, exiting Oct 8 23:31:43 devrp1 crmd[2350]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67) Oct 8 23:31:43 devrp1 attrd[2348]: error: attrd_cib_connection_destroy: Connection to the CIB terminated... Oct 8 23:31:43 devrp1 lrmd[2347]: warning: qb_ipcs_event_sendv: new_event_notification (2347-2350-6): Bad file descriptor (9) Oct 8 23:31:43 devrp1 lrmd[2347]: warning: send_client_notify: Notification of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed Oct 8 23:31:43 devrp1 lrmd[2347]: warning: send_client_notify: Notification of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed Devrp2 Oct 8 23:31:26 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 Oct 8 23:31:42 devrp2 crmd[2361]: notice: cman_event_callback: Membership 428: quorum lost Oct 8 23:31:43 devrp2 fenced[1747]: telling cman to remove nodeid 1 from cluster Oct 8 23:31:45 devrp2 corosync[1685]: [TOTEM ] A processor failed, forming new configuration. Oct 8 23:31:45 devrp2 corosync[1685]: [CMAN ] quorum lost, blocking activity Oct 8 23:31:45 devrp2 corosync[1685]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 8 23:31:45 devrp2 corosync[1685]: [QUORUM] Members[1]: 2 Oct 8 23:31:45 devrp2 corosync[1685]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:45 devrp2 corosync[1685]: [CMAN ] quorum regained, resuming activity Oct 8 23:31:45 devrp2 corosync[1685]: [QUORUM] This node is within the primary component and will provide service. Oct 8 23:31:45 devrp2 corosync[1685]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:45 devrp2 corosync[1685]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:45 devrp2 corosync[1685]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.52) ; members(old:2 left:1) Oct 8 23:31:45 devrp2 corosync[1685]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:45 devrp2 fenced[1747]: receive_start 1:3 add node with started_count 1 Oct 8 23:31:45 devrp2 crmd[2361]: notice: cman_event_callback: Membership 428: quorum acquired Oct 8 23:31:45 devrp2 crmd[2361]: notice: crm_update_peer_state: cman_event_callback: Node devrp1[1] - state is now lost (was member) Oct 8 23:31:45 devrp2 crmd[2361]: warning: match_down_event: No match for shutdown action on devrp1 Oct 8 23:31:45 devrp2 crmd[2361]: notice: peer_update_callback: Stonith/shutdown of devrp1 not matched Oct 8 23:31:49 devrp2 crmd[2361]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ] Oct 8 23:31:49 devrp2 crmd[2361]: warning: match_down_event: No match for shutdown action on devrp1 Oct 8 23:31:49 devrp2 crmd[2361]: notice: peer_update_callback: Stonith/shutdown of devrp1 not matched Oct 8 23:31:49 devrp2 crmd[2361]: notice: do_election_count_vote: Election 2 (current: 2, owner: devrp1): Processed vote from devrp1 (Peer is not part of our cluster) Oct 8 23:31:49 devrp2 pengine[2360]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 8 23:31:49 devrp2 kernel: dlm: closing connection to node 1 Oct 8 23:31:49 devrp2 attrd[2359]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Oct 8 23:31:49 devrp2 attrd[2359]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1412658023) Oct 8 23:31:49 devrp2 dlm_controld[1773]: /sys/kernel/config/dlm/cluster/comms/1: rmdir failed: 2 Oct 8 23:31:49 devrp2 attrd[2359]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Oct 8 23:31:49 devrp2 pengine[2360]: notice: LogActions: Start ybrpip#011(devrp2) Oct 8 23:31:49 devrp2 pengine[2360]: notice: process_pe_message: Calculated Transition 1061: /var/lib/pacemaker/pengine/pe-input-2282.bz2 Oct 8 23:31:50 devrp2 pengine[2360]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 8 23:31:50 devrp2 pengine[2360]: notice: LogActions: Start ybrpip#011(devrp2) Oct 8 23:31:50 devrp2 pengine[2360]: notice: process_pe_message: Calculated Transition 1062: /var/lib/pacemaker/pengine/pe-input-2283.bz2 Oct 8 23:31:50 devrp2 crmd[2361]: notice: te_rsc_command: Initiating action 5: start ybrpip_start_0 on devrp2 (local) Oct 8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Adding inet address 10.172.214.50/24 with broadcast address 10.172.214.255 to device eth0 Oct 8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Bringing device eth0 up Oct 8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.172.214.50 eth0 10.172.214.50 auto not_used not_used Oct 8 23:31:51 devrp2 crmd[2361]: notice: process_lrm_event: LRM operation ybrpip_start_0 (call=995, rc=0, cib-update=1494, confirmed=true) ok Oct 8 23:31:51 devrp2 crmd[2361]: notice: te_rsc_command: Initiating action 6: monitor ybrpip_monitor_5000 on devrp2 (local) Oct 8 23:31:51 devrp2 crmd[2361]: notice: process_lrm_event: LRM operation ybrpip_monitor_5000 (call=998, rc=0, cib-update=1495, confirmed=false) ok Oct 8 23:31:51 devrp2 crmd[2361]: notice: run_graph: Transition 1062 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2283.bz2): Complete Oct 8 23:31:51 devrp2 crmd[2361]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 8 23:31:52 devrp2 ntpd[2099]: Listen normally on 6 eth0 10.172.214.50 UDP 123 Oct 8 23:31:52 devrp2 ntpd[2099]: peers refreshed Oct 8 23:31:52 devrp2 corosync[1685]: [TOTEM ] A processor failed, forming new configuration. Oct 8 23:31:54 devrp2 corosync[1685]: [CMAN ] quorum lost, blocking activity Oct 8 23:31:54 devrp2 corosync[1685]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 8 23:31:54 devrp2 corosync[1685]: [QUORUM] Members[1]: 2 Oct 8 23:31:54 devrp2 corosync[1685]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:54 devrp2 crmd[2361]: notice: cman_event_callback: Membership 432: quorum lost Oct 8 23:31:54 devrp2 corosync[1685]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.52) ; members(old:2 left:1) Oct 8 23:31:54 devrp2 corosync[1685]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:54 devrp2 crmd[2361]: warning: match_down_event: No match for shutdown action on devrp1 Oct 8 23:31:54 devrp2 crmd[2361]: notice: peer_update_callback: Stonith/shutdown of devrp1 not matched Oct 8 23:31:54 devrp2 crmd[2361]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ] Oct 8 23:31:54 devrp2 attrd[2359]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Oct 8 23:31:54 devrp2 attrd[2359]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1412658023) Oct 8 23:31:55 devrp2 attrd[2359]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Oct 8 23:31:55 devrp2 pengine[2360]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 8 23:31:55 devrp2 pengine[2360]: notice: process_pe_message: Calculated Transition 1063: /var/lib/pacemaker/pengine/pe-input-2284.bz2 Oct 8 23:31:55 devrp2 crmd[2361]: notice: run_graph: Transition 1063 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete Oct 8 23:31:55 devrp2 crmd[2361]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Oct 8 23:31:59 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 Oct 8 23:36:52 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 Oct 8 23:46:55 devrp2 crmd[2361]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Oct 8 23:46:55 devrp2 pengine[2360]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 8 23:46:55 devrp2 pengine[2360]: notice: process_pe_message: Calculated Transition 1064: /var/lib/pacemaker/pengine/pe-input-2284.bz2 Oct 8 23:46:55 devrp2 crmd[2361]: notice: run_graph: Transition 1064 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete Oct 8 23:46:55 devrp2 crmd[2361]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org