This is giving me a lot to go on, thanks for the responses from everyone. Gene
On 4/9/14, 10:32 AM, "Digimer" <li...@alteeve.ca> wrote: >When a node enters an unknown state (from the perspective of the rest of >the cluster), it is extremely unsafe to assume what state it is in. The >only safe option is to block and call a fence to put the lost node into >a known state. Only when the fence action confirms that the lost node >was successfully isolated (rebooted, usually) is it safe for the cluster >to proceed with recovery. > >A properly configured cluster will react to a failed fence by blocking. >An improperly configured cluster will make assumptions and enter an >undefined state where it's hard to predict what will happen next, but >often it's "not good". > >Take a minute to read this please: > >https://alteeve.ca/w/AN!Cluster_Tutorial_2#Concept.3B_Fencing > >It's about cman + rgmanager, but the concepts port 1:1 to pacemaker. > >The best analogy I can think of for fencing is to compare it to >seatbelts in cars. You don't appreciate their importance when you've >never had an accident, so often people leave them unbuckled. When you >crash though, the seatbelt can make all the difference in the world. >Fencing is like that. I often hear people say "I've been in production >for over a year without fencing and it was fine!". Of course, they >didn't crash in that time, so they didn't need fencing before then. > >digimer > >On 09/04/14 12:10 PM, Campbell, Gene wrote: >> Thanks for the response. I hope you don¹t mind a couple questions along >> the way to understanding this issue. >> >> We have storage attached to vm5 >> Power is cut to vm5 >> Failover to vm6 happens and storage is made available there >> vm5 reboots >> >> Can you tell Where fencing is happening in this picture? Will keep >> reading docs, and looking at logs, but anything think you do to help >>would >> be much appreciated. >> >> Thanks >> Gene >> >> >> >> On 4/8/14, 2:29 PM, "Digimer" <li...@alteeve.ca> wrote: >> >>> Looks like your fencing (stonith) failed. >>> >>> On 08/04/14 05:25 PM, Campbell, Gene wrote: >>>> Hello fine folks in Pacemaker land. Hopefully you could share your >>>> insight into this little problem for us. >>>> >>>> We have a intermittent problem with failover. >>>> >>>> two node cluster >>>> first node power is cut >>>> failover begins to second node >>>> first node reboots >>>> crm_mon -1 on the rebooted node is PENDING (never goes to ONLINE) >>>> >>>> Example output from vm5 >>>> Node lotus-4vm5: pending >>>> Online: [ lotus-4vm6 ] >>>> >>>> Example output from vm6 >>>> Online: [ lotus-4vm5 lotus-4vm6 ] >>>> >>>> Environment >>>> Centos 6.5 on KVM vms >>>> Pacemaker 1.1.10 >>>> Corosync 1.4.1 >>>> >>>> vm5 /var/log/messages >>>> Apr 8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster >>>>Manager >>>> Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: main: Starting >>>> Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726): generated-manpages >>>> agent-manpages ascii-docs publican-docs ncurses libqb-logging >>>>libqb-ipc >>>> nagios corosync-plugin cman >>>> Apr 8 09:54:07 lotus-4vm5 pacemakerd[1783]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: crm_cluster_connect: >>>> Connecting to cluster infrastructure: classic openais (with plugin) >>>> Apr 8 09:54:07 lotus-4vm5 crmd[1794]: notice: main: CRM Git >>>>Version: >>>> 368c726 >>>> Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:07 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Recorded connection 0x20b6280 for attrd/0 >>>> Apr 8 09:54:07 lotus-4vm5 attrd[1792]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:07 lotus-4vm5 stonith-ng[1790]: notice: >>>> crm_cluster_connect: Connecting to cluster infrastructure: classic >>>> openais (with plugin) >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_cluster_connect: >>>> Connecting to cluster infrastructure: classic openais (with plugin) >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>>> Apr 8 09:54:08 lotus-4vm5 attrd[1792]: notice: main: Starting >>>> mainloop... >>>> Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Recorded connection 0x20ba600 for stonith-ng/0 >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Recorded connection 0x20be980 for cib/0 >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Sending membership update 24 to cib >>>> Apr 8 09:54:08 lotus-4vm5 stonith-ng[1790]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: >>>> plugin_handle_membership: Membership 24: quorum acquired >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was (null)) >>>> Apr 8 09:54:08 lotus-4vm5 cib[1789]: notice: crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now >>>> member (was (null)) >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: crm_cluster_connect: >>>> Connecting to cluster infrastructure: classic openais (with plugin) >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Recorded connection 0x20c2d00 for crmd/0 >>>> Apr 8 09:54:08 lotus-4vm5 corosync[1364]: [pcmk ] info: pcmk_ipc: >>>> Sending membership update 24 to crmd >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: get_node_name: >>>> Defaulting to uname -n for the local classic openais (with plugin) >>>>node >>>> name >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: >>>> plugin_handle_membership: Membership 24: quorum acquired >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: >>>>crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was (null)) >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: >>>>crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now >>>> member (was (null)) >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: do_started: The local >>>> CRM is operational >>>> Apr 8 09:54:08 lotus-4vm5 crmd[1794]: notice: do_state_transition: >>>> State transition S_STARTING -> S_PENDING [ input=I_PENDING >>>> cause=C_FSA_INTERNAL origin=do_started ] >>>> Apr 8 09:54:09 lotus-4vm5 stonith-ng[1790]: notice: setup_cib: >>>> Watching for stonith topology changes >>>> Apr 8 09:54:09 lotus-4vm5 stonith-ng[1790]: notice: unpack_config: >>>> On loss of CCM Quorum: Ignore >>>> Apr 8 09:54:10 lotus-4vm5 stonith-ng[1790]: notice: >>>> stonith_device_register: Added 'st-fencing' to the device list (1 >>>>active >>>> devices) >>>> Apr 8 09:54:10 lotus-4vm5 cib[1789]: notice: >>>> cib_server_process_diff: Not applying diff 0.31.21 -> 0.31.22 (sync in >>>> progress) >>>> Apr 8 09:54:29 lotus-4vm5 crmd[1794]: warning: do_log: FSA: Input >>>> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING >>>> Apr 8 09:56:29 lotus-4vm5 crmd[1794]: error: crm_timer_popped: >>>> Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION! >>>> (120000ms) >>>> Apr 8 09:56:29 lotus-4vm5 crmd[1794]: notice: do_state_transition: >>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC >>>> cause=C_TIMER_POPPED origin=crm_timer_popped ] >>>> Apr 8 09:56:29 lotus-4vm5 crmd[1794]: warning: do_log: FSA: Input >>>> I_RELEASE_DC from do_election_count_vote() received in state >>>> S_INTEGRATION >>>> Apr 8 09:56:29 lotus-4vm5 crmd[1794]: warning: join_query_callback: >>>> No DC for join-1 >>>> >>>> >>>> vm6 /var/log/messages >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Transitional membership event on ring 16: memb=1, >>>> new=0, lost=0 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: memb: lotus-4vm6 3192917514 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, >>>> lost=0 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> update_member: Node 3176140298/lotus-4vm5 is now: member >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: NEW: lotus-4vm5 3176140298 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> send_member_notification: Sending membership update 16 to 2 children >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [TOTEM ] A processor >>>> joined or left the membership and a new membership was formed. >>>> Apr 8 09:52:51 lotus-4vm6 crmd[2496]: notice: >>>> plugin_handle_membership: Membership 16: quorum acquired >>>> Apr 8 09:52:51 lotus-4vm6 crmd[2496]: notice: >>>>crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was lost) >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 16 >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> send_member_notification: Sending membership update 16 to 2 children >>>> Apr 8 09:52:51 lotus-4vm6 cib[2491]: notice: >>>> plugin_handle_membership: Membership 16: quorum acquired >>>> Apr 8 09:52:51 lotus-4vm6 cib[2491]: notice: crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was lost) >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [CPG ] chosen downlist: >>>> sender r(0) ip(10.14.80.189) r(1) ip(10.128.0.189) ; members(old:1 >>>> left:0) >>>> Apr 8 09:52:51 lotus-4vm6 corosync[2442]: [MAIN ] Completed >>>>service >>>> synchronization, ready to provide service. >>>> Apr 8 09:52:57 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:53:14 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:53:15 lotus-4vm6 stonith-ng[2492]: warning: >>>>parse_host_line: >>>> Could not parse (38 47): "console" >>>> Apr 8 09:53:20 lotus-4vm6 corosync[2442]: [TOTEM ] A processor >>>> failed, forming new configuration. >>>> Apr 8 09:53:21 lotus-4vm6 stonith-ng[2492]: notice: log_operation: >>>> Operation 'reboot' [3306] (call 2 from crmd.2496) for host >>>>'lotus-4vm5' >>>> with device 'st-fencing' returned: 0 (OK) >>>> Apr 8 09:53:21 lotus-4vm6 crmd[2496]: notice: erase_xpath_callback: >>>> Deletion of "//node_state[@uname='lotus-4vm5']/lrm": Timer expired >>>> (rc=-62) >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Transitional membership event on ring 20: memb=1, >>>> new=0, lost=1 >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: memb: lotus-4vm6 3192917514 >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: lost: lotus-4vm5 3176140298 >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, >>>> lost=0 >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514 >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> ais_mark_unseen_peer_dead: Node lotus-4vm5 was not seen in the >>>>previous >>>> transition >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> update_member: Node 3176140298/lotus-4vm5 is now: lost >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> send_member_notification: Sending membership update 20 to 2 children >>>> Apr 8 09:53:26 lotus-4vm6 corosync[2442]: [TOTEM ] A processor >>>> joined or left the membership and a new membership was formed. >>>> Apr 8 09:53:26 lotus-4vm6 cib[2491]: notice: >>>> plugin_handle_membership: Membership 20: quorum lost >>>> Apr 8 09:53:26 lotus-4vm6 cib[2491]: notice: crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> lost (was member) >>>> Apr 8 09:53:26 lotus-4vm6 crmd[2496]: notice: >>>> plugin_handle_membership: Membership 20: quorum lost >>>> Apr 8 09:53:26 lotus-4vm6 crmd[2496]: notice: >>>>crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> lost (was member) >>>> Apr 8 09:53:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:53:43 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:01 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Transitional membership event on ring 24: memb=1, >>>> new=0, lost=0 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: memb: lotus-4vm6 3192917514 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] notice: >>>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, >>>> lost=0 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> update_member: Node 3176140298/lotus-4vm5 is now: member >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: NEW: lotus-4vm5 3176140298 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> send_member_notification: Sending membership update 24 to 2 children >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [TOTEM ] A processor >>>> joined or left the membership and a new membership was formed. >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: >>>> plugin_handle_membership: Membership 24: quorum acquired >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: >>>>crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was lost) >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 24 >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [pcmk ] info: >>>> send_member_notification: Sending membership update 24 to 2 children >>>> Apr 8 09:54:04 lotus-4vm6 cib[2491]: notice: >>>> plugin_handle_membership: Membership 24: quorum acquired >>>> Apr 8 09:54:04 lotus-4vm6 cib[2491]: notice: crm_update_peer_state: >>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now >>>> member (was lost) >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [CPG ] chosen downlist: >>>> sender r(0) ip(10.14.80.190) r(1) ip(10.128.0.190) ; members(old:2 >>>> left:1) >>>> Apr 8 09:54:04 lotus-4vm6 corosync[2442]: [MAIN ] Completed >>>>service >>>> synchronization, ready to provide service. >>>> Apr 8 09:54:04 lotus-4vm6 stonith-ng[2492]: notice: remote_op_done: >>>> Operation reboot of lotus-4vm5 by lotus-4vm6 for >>>> crmd.2496@lotus-4vm6.ae82b411<mailto:crmd.2496@lotus-4vm6.ae82b411>: >>>>OK >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: >>>> tengine_stonith_callback: Stonith operation >>>> 2/13:0:0:f325afae-64b0-4812-a897-70556ab1e806: OK (0) >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: >>>> tengine_stonith_notify: Peer lotus-4vm5 was terminated (reboot) by >>>> lotus-4vm6 for lotus-4vm6: OK >>>>(ref=ae82b411-b07a-4235-be55-5a30a00b323b) >>>> by client crmd.2496 >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: >>>>crm_update_peer_state: >>>> send_stonith_update: Node lotus-4vm5[3176140298] - state is now lost >>>> (was member) >>>> Apr 8 09:54:04 lotus-4vm6 crmd[2496]: notice: run_graph: Transition >>>> 0 (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0, >>>> Source=/var/lib/pacemaker/pengine/pe-warn-25.bz2): Stopped >>>> Apr 8 09:54:04 lotus-4vm6 attrd[2494]: notice: >>>>attrd_local_callback: >>>> Sending full refresh (origin=crmd) >>>> Apr 8 09:54:04 lotus-4vm6 attrd[2494]: notice: >>>>attrd_trigger_update: >>>> Sending flush op to all hosts for: probe_complete (true) >>>> Apr 8 09:54:05 lotus-4vm6 pengine[2495]: notice: unpack_config: On >>>> loss of CCM Quorum: Ignore >>>> Apr 8 09:54:05 lotus-4vm6 pengine[2495]: notice: LogActions: Start >>>> st-fencing#011(lotus-4vm6) >>>> Apr 8 09:54:05 lotus-4vm6 pengine[2495]: notice: LogActions: Start >>>> MGS_607d26#011(lotus-4vm6) >>>> Apr 8 09:54:05 lotus-4vm6 pengine[2495]: notice: >>>>process_pe_message: >>>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-912.bz2 >>>> Apr 8 09:54:05 lotus-4vm6 crmd[2496]: notice: te_rsc_command: >>>> Initiating action 5: start st-fencing_start_0 on lotus-4vm6 (local) >>>> Apr 8 09:54:05 lotus-4vm6 crmd[2496]: notice: te_rsc_command: >>>> Initiating action 6: start MGS_607d26_start_0 on lotus-4vm6 (local) >>>> Apr 8 09:54:05 lotus-4vm6 stonith-ng[2492]: notice: >>>> stonith_device_register: Device 'st-fencing' already existed in device >>>> list (1 active devices) >>>> Apr 8 09:54:05 lotus-4vm6 kernel: LDISKFS-fs warning (device sda): >>>> ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, >>>> please wait. >>>> Apr 8 09:54:05 lotus-4vm6 kernel: >>>> Apr 8 09:54:10 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:11 lotus-4vm6 crmd[2496]: warning: get_rsc_metadata: No >>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error >>>> (-5) >>>> Apr 8 09:54:11 lotus-4vm6 crmd[2496]: notice: process_lrm_event: >>>>LRM >>>> operation st-fencing_start_0 (call=24, rc=0, cib-update=89, >>>> confirmed=true) ok >>>> Apr 8 09:54:11 lotus-4vm6 crmd[2496]: warning: crmd_cs_dispatch: >>>> Recieving messages from a node we think is dead: >>>>lotus-4vm5[-1118826998] >>>> Apr 8 09:54:24 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:31 lotus-4vm6 crmd[2496]: notice: >>>> do_election_count_vote: Election 2 (current: 2, owner: lotus-4vm5): >>>> Processed vote from lotus-4vm5 (Peer is not part of our cluster) >>>> Apr 8 09:54:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:46 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): recovery complete >>>> Apr 8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): mounted >>>>filesystem >>>> with ordered data mode. quota=on. Opts: >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ [ ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ { ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "args": [ ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "mount", ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "-t", ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "lustre", ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ >>>> "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1", ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "/mnt/MGS" ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ ], ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "rc": 0, ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "stderr": "", ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ "stdout": "" ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ } ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ ] ] >>>> Apr 8 09:54:48 lotus-4vm6 lrmd[2493]: notice: operation_finished: >>>> MGS_607d26_start_0:3444:stderr [ ] >>>> Apr 8 09:54:48 lotus-4vm6 crmd[2496]: notice: process_lrm_event: >>>>LRM >>>> operation MGS_607d26_start_0 (call=26, rc=0, cib-update=94, >>>> confirmed=true) ok >>>> Apr 8 09:54:49 lotus-4vm6 crmd[2496]: notice: run_graph: Transition >>>> 1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0, >>>> Source=/var/lib/pacemaker/pengine/pe-input-912.bz2): Stopped >>>> Apr 8 09:54:49 lotus-4vm6 attrd[2494]: notice: >>>>attrd_local_callback: >>>> Sending full refresh (origin=crmd) >>>> Apr 8 09:54:49 lotus-4vm6 attrd[2494]: notice: >>>>attrd_trigger_update: >>>> Sending flush op to all hosts for: probe_complete (true) >>>> Apr 8 09:54:50 lotus-4vm6 pengine[2495]: notice: unpack_config: On >>>> loss of CCM Quorum: Ignore >>>> Apr 8 09:54:50 lotus-4vm6 pengine[2495]: notice: >>>>process_pe_message: >>>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-913.bz2 >>>> Apr 8 09:54:50 lotus-4vm6 crmd[2496]: notice: te_rsc_command: >>>> Initiating action 9: monitor MGS_607d26_monitor_5000 on lotus-4vm6 >>>> (local) >>>> Apr 8 09:54:51 lotus-4vm6 crmd[2496]: notice: process_lrm_event: >>>>LRM >>>> operation MGS_607d26_monitor_5000 (call=30, rc=0, cib-update=102, >>>> confirmed=false) ok >>>> Apr 8 09:54:51 lotus-4vm6 crmd[2496]: notice: run_graph: Transition >>>> 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>>> Source=/var/lib/pacemaker/pengine/pe-input-913.bz2): Complete >>>> Apr 8 09:54:51 lotus-4vm6 crmd[2496]: notice: do_state_transition: >>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >>>> cause=C_FSA_INTERNAL origin=notify_crmd ] >>>> Apr 8 09:55:07 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:55:23 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:55:38 lotus-4vm6 kernel: Lustre: Evicted from MGS (at >>>> 10.14.80.190@tcp) after server handle changed from 0x7acffb201664d0a4 >>>>to >>>> 0x9a6b02eee57f3dba >>>> Apr 8 09:55:38 lotus-4vm6 kernel: Lustre: MGC10.14.80.189@tcp: >>>> Connection restored to MGS (at 0@lo) >>>> Apr 8 09:55:42 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:55:58 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:56:12 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:56:26 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:56:31 lotus-4vm6 crmd[2496]: warning: crmd_ha_msg_filter: >>>> Another DC detected: lotus-4vm5 (op=join_offer) >>>> Apr 8 09:56:31 lotus-4vm6 crmd[2496]: notice: do_state_transition: >>>> State transition S_IDLE -> S_ELECTION [ input=I_ELECTION >>>> cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ] >>>> Apr 8 09:56:31 lotus-4vm6 crmd[2496]: notice: do_state_transition: >>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC >>>> cause=C_FSA_INTERNAL origin=do_election_check ] >>>> Apr 8 09:56:31 lotus-4vm6 crmd[2496]: notice: >>>> do_election_count_vote: Election 3 (current: 3, owner: lotus-4vm6): >>>> Processed no-vote from lotus-4vm5 (Peer is not part of our cluster) >>>> Apr 8 09:56:36 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to >>>> 10.14.80.1 port 67 (xid=0x78d16782) >>>> Apr 8 09:56:37 lotus-4vm6 crmd[2496]: warning: get_rsc_metadata: No >>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error >>>> (-5) >>>> Apr 8 09:56:37 lotus-4vm6 attrd[2494]: notice: >>>>attrd_local_callback: >>>> Sending full refresh (origin=crmd) >>>> Apr 8 09:56:37 lotus-4vm6 attrd[2494]: notice: >>>>attrd_trigger_update: >>>> Sending flush op to all hosts for: probe_complete (true) >>>> Apr 8 09:56:38 lotus-4vm6 pengine[2495]: notice: unpack_config: On >>>> loss of CCM Quorum: Ignore >>>> Apr 8 09:56:38 lotus-4vm6 pengine[2495]: notice: >>>>process_pe_message: >>>> Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-914.bz2 >>>> Apr 8 09:56:38 lotus-4vm6 crmd[2496]: notice: run_graph: Transition >>>> 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>>> Source=/var/lib/pacemaker/pengine/pe-input-914.bz2): Complete >>>> Apr 8 09:56:38 lotus-4vm6 crmd[2496]: notice: do_state_transition: >>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >>>> cause=C_FSA_INTERNAL origin=notify_crmd ] >>>> >>>> Thank you very much >>>> Gene >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: >>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > >-- >Digimer >Papers and Projects: https://alteeve.ca/w/ >What if the cure for cancer is trapped in the mind of a person without >access to education? > >_______________________________________________ >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >Project Home: http://www.clusterlabs.org >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org