One of my nodes died in a 2 node cluster

I gather something went wrong, and it fenced/killed itself. But I am not sure 
what happened.

I think maybe around that time the VM backups happened and snap of the VM could 
have happened

But there is nothing for me to put my finger on

Output from messages around that time

This is on devrp1
Oct  8 23:31:38 devrp1 corosync[1670]:   [TOTEM ] A processor failed, forming 
new configuration.
Oct  8 23:31:40 devrp1 corosync[1670]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:40 devrp1 corosync[1670]:   [QUORUM] This node is within the 
non-primary component and will NOT provide any services.
Oct  8 23:31:40 devrp1 corosync[1670]:   [QUORUM] Members[1]: 1
Oct  8 23:31:40 devrp1 corosync[1670]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Oct  8 23:31:40 devrp1 corosync[1670]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.214.51) ; members(old:2 left:1)
Oct  8 23:31:40 devrp1 corosync[1670]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 23:31:41 devrp1 kernel: dlm: closing connection to node 2
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: cman_event_callback: Membership 
424: quorum lost
Oct  8 23:31:42 devrp1 corosync[1670]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Oct  8 23:31:42 devrp1 corosync[1670]:   [CMAN  ] quorum regained, resuming 
activity
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] This node is within the 
primary component and will provide service.
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:42 devrp1 corosync[1670]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.214.51) ; members(old:1 left:0)
Oct  8 23:31:42 devrp1 corosync[1670]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: crm_update_peer_state: 
cman_event_callback: Node devrp2[2] - state is now lost (was member)
Oct  8 23:31:42 devrp1 crmd[2350]:  warning: reap_dead_nodes: Our DC node 
(devrp2) left the cluster
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: cman_event_callback: Membership 
428: quorum acquired
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: crm_update_peer_state: 
cman_event_callback: Node devrp2[2] - state is now member (was lost)
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: do_state_transition: State 
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL 
origin=reap_dead_nodes ]
Oct  8 23:31:42 devrp1 corosync[1670]: cman killed by node 2 because we were 
killed by cman_tool or other application
Oct  8 23:31:42 devrp1 pacemakerd[2339]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 stonith-ng[2346]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 crmd[2350]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 crmd[2350]:    error: crmd_cs_destroy: connection 
terminated
Oct  8 23:31:43 devrp1 fenced[1726]: cluster is down, exiting
Oct  8 23:31:43 devrp1 fenced[1726]: daemon cpg_dispatch error 2
Oct  8 23:31:43 devrp1 attrd[2348]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Oct  8 23:31:43 devrp1 attrd[2348]:     crit: attrd_cs_destroy: Lost connection 
to Corosync service!
Oct  8 23:31:43 devrp1 attrd[2348]:   notice: main: Exiting...
Oct  8 23:31:43 devrp1 attrd[2348]:   notice: main: Disconnecting client 
0x18cf240, pid=2350...
Oct  8 23:31:43 devrp1 pacemakerd[2339]:    error: mcp_cpg_destroy: Connection 
destroyed
Oct  8 23:31:43 devrp1 cib[2345]:    error: pcmk_cpg_dispatch: Connection to 
the CPG API failed: Library error (2)
Oct  8 23:31:43 devrp1 cib[2345]:    error: cib_cs_destroy: Corosync connection 
lost!  Exiting.
Oct  8 23:31:43 devrp1 stonith-ng[2346]:    error: stonith_peer_cs_destroy: 
Corosync connection terminated
Oct  8 23:31:43 devrp1 dlm_controld[1752]: cluster is down, exiting
Oct  8 23:31:43 devrp1 dlm_controld[1752]: daemon cpg_dispatch error 2
Oct  8 23:31:43 devrp1 gfs_controld[1801]: cluster is down, exiting
Oct  8 23:31:43 devrp1 crmd[2350]:   notice: crmd_exit: Forcing immediate exit: 
Link has been severed (67)
Oct  8 23:31:43 devrp1 attrd[2348]:    error: attrd_cib_connection_destroy: 
Connection to the CIB terminated...
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: qb_ipcs_event_sendv: 
new_event_notification (2347-2350-6): Bad file descriptor (9)
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: send_client_notify: Notification 
of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: send_client_notify: Notification 
of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed


Devrp2
Oct  8 23:31:26 devrp2 kernel: IN=eth0 OUT= 
MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 
TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:31:42 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 
428: quorum lost
Oct  8 23:31:43 devrp2 fenced[1747]: telling cman to remove nodeid 1 from 
cluster
Oct  8 23:31:45 devrp2 corosync[1685]:   [TOTEM ] A processor failed, forming 
new configuration.
Oct  8 23:31:45 devrp2 corosync[1685]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] This node is within the 
non-primary component and will NOT provide any services.
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[1]: 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Oct  8 23:31:45 devrp2 corosync[1685]:   [CMAN  ] quorum regained, resuming 
activity
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] This node is within the 
primary component and will provide service.
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.214.52) ; members(old:2 left:1)
Oct  8 23:31:45 devrp2 corosync[1685]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 23:31:45 devrp2 fenced[1747]: receive_start 1:3 add node with 
started_count 1
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 
428: quorum acquired
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: crm_update_peer_state: 
cman_event_callback: Node devrp1[1] - state is now lost (was member)
Oct  8 23:31:45 devrp2 crmd[2361]:  warning: match_down_event: No match for 
shutdown action on devrp1
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: peer_update_callback: 
Stonith/shutdown of devrp1 not matched
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=check_join_state ]
Oct  8 23:31:49 devrp2 crmd[2361]:  warning: match_down_event: No match for 
shutdown action on devrp1
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: peer_update_callback: 
Stonith/shutdown of devrp1 not matched
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: do_election_count_vote: Election 2 
(current: 2, owner: devrp1): Processed vote from devrp1 (Peer is not part of 
our cluster)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct  8 23:31:49 devrp2 kernel: dlm: closing connection to node 1
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_local_callback: Sending 
full refresh (origin=crmd)
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1412658023)
Oct  8 23:31:49 devrp2 dlm_controld[1773]: 
/sys/kernel/config/dlm/cluster/comms/1: rmdir failed: 2
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: probe_complete (true)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: LogActions: Start   
ybrpip#011(devrp2)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: process_pe_message: Calculated 
Transition 1061: /var/lib/pacemaker/pengine/pe-input-2282.bz2
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: LogActions: Start   
ybrpip#011(devrp2)
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: process_pe_message: Calculated 
Transition 1062: /var/lib/pacemaker/pengine/pe-input-2283.bz2
Oct  8 23:31:50 devrp2 crmd[2361]:   notice: te_rsc_command: Initiating action 
5: start ybrpip_start_0 on devrp2 (local)
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Adding inet address 
10.172.214.50/24 with broadcast address 10.172.214.255 to device eth0
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Bringing device eth0 up
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-10.172.214.50 eth0 10.172.214.50 auto 
not_used not_used
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: process_lrm_event: LRM operation 
ybrpip_start_0 (call=995, rc=0, cib-update=1494, confirmed=true) ok
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: te_rsc_command: Initiating action 
6: monitor ybrpip_monitor_5000 on devrp2 (local)
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: process_lrm_event: LRM operation 
ybrpip_monitor_5000 (call=998, rc=0, cib-update=1495, confirmed=false) ok
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: run_graph: Transition 1062 
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-2283.bz2): Complete
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct  8 23:31:52 devrp2 ntpd[2099]: Listen normally on 6 eth0 10.172.214.50 UDP 
123
Oct  8 23:31:52 devrp2 ntpd[2099]: peers refreshed
Oct  8 23:31:52 devrp2 corosync[1685]:   [TOTEM ] A processor failed, forming 
new configuration.
Oct  8 23:31:54 devrp2 corosync[1685]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:54 devrp2 corosync[1685]:   [QUORUM] This node is within the 
non-primary component and will NOT provide any services.
Oct  8 23:31:54 devrp2 corosync[1685]:   [QUORUM] Members[1]: 2
Oct  8 23:31:54 devrp2 corosync[1685]:   [TOTEM ] A processor joined or left 
the membership and a new membership was formed.
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 
432: quorum lost
Oct  8 23:31:54 devrp2 corosync[1685]:   [CPG   ] chosen downlist: sender r(0) 
ip(10.172.214.52) ; members(old:2 left:1)
Oct  8 23:31:54 devrp2 corosync[1685]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 23:31:54 devrp2 crmd[2361]:  warning: match_down_event: No match for 
shutdown action on devrp1
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: peer_update_callback: 
Stonith/shutdown of devrp1 not matched
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=check_join_state ]
Oct  8 23:31:54 devrp2 attrd[2359]:   notice: attrd_local_callback: Sending 
full refresh (origin=crmd)
Oct  8 23:31:54 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: last-failure-ybrpstat (1412658023)
Oct  8 23:31:55 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: probe_complete (true)
Oct  8 23:31:55 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct  8 23:31:55 devrp2 pengine[2360]:   notice: process_pe_message: Calculated 
Transition 1063: /var/lib/pacemaker/pengine/pe-input-2284.bz2
Oct  8 23:31:55 devrp2 crmd[2361]:   notice: run_graph: Transition 1063 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete
Oct  8 23:31:55 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct  8 23:31:59 devrp2 kernel: IN=eth0 OUT= 
MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 
TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:36:52 devrp2 kernel: IN=eth0 OUT= 
MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 
TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Oct  8 23:46:55 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Oct  8 23:46:55 devrp2 pengine[2360]:   notice: process_pe_message: Calculated 
Transition 1064: /var/lib/pacemaker/pengine/pe-input-2284.bz2
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: run_graph: Transition 1064 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to