Re: [Pacemaker] Node stuck in pending state

Campbell, Gene Wed, 09 Apr 2014 12:03:13 -0700

This is giving me a lot to go on, thanks for the responses from everyone.

Gene



On 4/9/14, 10:32 AM, "Digimer" <li...@alteeve.ca> wrote:

>When a node enters an unknown state (from the perspective of the rest of
>the cluster), it is extremely unsafe to assume what state it is in. The
>only safe option is to block and call a fence to put the lost node into
>a known state. Only when the fence action confirms that the lost node
>was successfully isolated (rebooted, usually) is it safe for the cluster
>to proceed with recovery.
>
>A properly configured cluster will react to a failed fence by blocking.
>An improperly configured cluster will make assumptions and enter an
>undefined state where it's hard to predict what will happen next, but
>often it's "not good".
>
>Take a minute to read this please:
>
>https://alteeve.ca/w/AN!Cluster_Tutorial_2#Concept.3B_Fencing
>
>It's about cman + rgmanager, but the concepts port 1:1 to pacemaker.
>
>The best analogy I can think of for fencing is to compare it to
>seatbelts in cars. You don't appreciate their importance when you've
>never had an accident, so often people leave them unbuckled. When you
>crash though, the seatbelt can make all the difference in the world.
>Fencing is like that. I often hear people say "I've been in production
>for over a year without fencing and it was fine!". Of course, they
>didn't crash in that time, so they didn't need fencing before then.
>
>digimer
>
>On 09/04/14 12:10 PM, Campbell, Gene wrote:
>> Thanks for the response.  I hope you don¹t mind a couple questions along
>> the way to understanding this issue.
>>
>> We have storage attached to vm5
>> Power is cut to vm5
>> Failover to vm6 happens and storage is made available there
>> vm5 reboots
>>
>> Can you tell Where fencing is happening in this picture?  Will keep
>> reading docs, and looking at logs, but anything think you do to help
>>would
>> be much appreciated.
>>
>> Thanks
>> Gene
>>
>>
>>
>> On 4/8/14, 2:29 PM, "Digimer" <li...@alteeve.ca> wrote:
>>
>>> Looks like your fencing (stonith) failed.
>>>
>>> On 08/04/14 05:25 PM, Campbell, Gene wrote:
>>>> Hello fine folks in Pacemaker land.   Hopefully you could share your
>>>> insight into this little problem for us.
>>>>
>>>> We have a intermittent problem with failover.
>>>>
>>>> two node cluster
>>>> first node power is cut
>>>> failover begins to second node
>>>> first node reboots
>>>> crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)
>>>>
>>>> Example output from vm5
>>>> Node lotus-4vm5: pending
>>>> Online: [ lotus-4vm6 ]
>>>>
>>>> Example output from vm6
>>>> Online: [ lotus-4vm5  lotus-4vm6 ]
>>>>
>>>> Environment
>>>> Centos 6.5 on KVM vms
>>>> Pacemaker 1.1.10
>>>> Corosync 1.4.1
>>>>
>>>> vm5 /var/log/messages
>>>> Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster
>>>>Manager
>>>> Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting
>>>> Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages
>>>> agent-manpages ascii-docs publican-docs ncurses libqb-logging
>>>>libqb-ipc
>>>> nagios  corosync-plugin cman
>>>> Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect:
>>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>>> Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git
>>>>Version:
>>>> 368c726
>>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Recorded connection 0x20b6280 for attrd/0
>>>> Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: classic
>>>> openais (with plugin)
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect:
>>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc
>>>> delivery failed (rc=-2)
>>>> Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting
>>>> mainloop...
>>>> Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Recorded connection 0x20ba600 for stonith-ng/0
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Recorded connection 0x20be980 for cib/0
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Sending membership update 24 to cib
>>>> Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice:
>>>> plugin_handle_membership: Membership 24: quorum acquired
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was (null))
>>>> Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
>>>> member (was (null))
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_cluster_connect:
>>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Recorded connection 0x20c2d00 for crmd/0
>>>> Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
>>>> Sending membership update 24 to crmd
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
>>>> Defaulting to uname -n for the local classic openais (with plugin)
>>>>node
>>>> name
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice:
>>>> plugin_handle_membership: Membership 24: quorum acquired
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice:
>>>>crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was (null))
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice:
>>>>crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
>>>> member (was (null))
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_started: The local
>>>> CRM is operational
>>>> Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
>>>> State transition S_STARTING -> S_PENDING [ input=I_PENDING
>>>> cause=C_FSA_INTERNAL origin=do_started ]
>>>> Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: setup_cib:
>>>> Watching for stonith topology changes
>>>> Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: unpack_config:
>>>> On loss of CCM Quorum: Ignore
>>>> Apr  8 09:54:10 lotus-4vm5 stonith-ng[1790]:   notice:
>>>> stonith_device_register: Added 'st-fencing' to the device list (1
>>>>active
>>>> devices)
>>>> Apr  8 09:54:10 lotus-4vm5 cib[1789]:   notice:
>>>> cib_server_process_diff: Not applying diff 0.31.21 -> 0.31.22 (sync in
>>>> progress)
>>>> Apr  8 09:54:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
>>>> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
>>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:    error: crm_timer_popped:
>>>> Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION!
>>>> (120000ms)
>>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
>>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>>> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
>>>> I_RELEASE_DC from do_election_count_vote() received in state
>>>> S_INTEGRATION
>>>> Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: join_query_callback:
>>>> No DC for join-1
>>>>
>>>>
>>>> vm6 /var/log/messages
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Transitional membership event on ring 16: memb=1,
>>>> new=0, lost=0
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
>>>> lost=0
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> update_member: Node 3176140298/lotus-4vm5 is now: member
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: NEW:  lotus-4vm5 3176140298
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> send_member_notification: Sending membership update 16 to 2 children
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>>> joined or left the membership and a new membership was formed.
>>>> Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice:
>>>> plugin_handle_membership: Membership 16: quorum acquired
>>>> Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice:
>>>>crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was lost)
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 16
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> send_member_notification: Sending membership update 16 to 2 children
>>>> Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice:
>>>> plugin_handle_membership: Membership 16: quorum acquired
>>>> Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was lost)
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
>>>> sender r(0) ip(10.14.80.189) r(1) ip(10.128.0.189) ; members(old:1
>>>> left:0)
>>>> Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed
>>>>service
>>>> synchronization, ready to provide service.
>>>> Apr  8 09:52:57 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:53:14 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:53:15 lotus-4vm6 stonith-ng[2492]:  warning:
>>>>parse_host_line:
>>>> Could not parse (38 47): "console"
>>>> Apr  8 09:53:20 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>>> failed, forming new configuration.
>>>> Apr  8 09:53:21 lotus-4vm6 stonith-ng[2492]:   notice: log_operation:
>>>> Operation 'reboot' [3306] (call 2 from crmd.2496) for host
>>>>'lotus-4vm5'
>>>> with device 'st-fencing' returned: 0 (OK)
>>>> Apr  8 09:53:21 lotus-4vm6 crmd[2496]:   notice: erase_xpath_callback:
>>>> Deletion of "//node_state[@uname='lotus-4vm5']/lrm": Timer expired
>>>> (rc=-62)
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Transitional membership event on ring 20: memb=1,
>>>> new=0, lost=1
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: lost: lotus-4vm5 3176140298
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0,
>>>> lost=0
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> ais_mark_unseen_peer_dead: Node lotus-4vm5 was not seen in the
>>>>previous
>>>> transition
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> update_member: Node 3176140298/lotus-4vm5 is now: lost
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> send_member_notification: Sending membership update 20 to 2 children
>>>> Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>>> joined or left the membership and a new membership was formed.
>>>> Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice:
>>>> plugin_handle_membership: Membership 20: quorum lost
>>>> Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> lost (was member)
>>>> Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice:
>>>> plugin_handle_membership: Membership 20: quorum lost
>>>> Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice:
>>>>crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> lost (was member)
>>>> Apr  8 09:53:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:53:43 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:01 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Transitional membership event on ring 24: memb=1,
>>>> new=0, lost=0
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: memb: lotus-4vm6 3192917514
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
>>>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
>>>> lost=0
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> update_member: Node 3176140298/lotus-4vm5 is now: member
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: NEW:  lotus-4vm5 3176140298
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: MEMB: lotus-4vm5 3176140298
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> pcmk_peer_update: MEMB: lotus-4vm6 3192917514
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> send_member_notification: Sending membership update 24 to 2 children
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
>>>> joined or left the membership and a new membership was formed.
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>>> plugin_handle_membership: Membership 24: quorum acquired
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>>>crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was lost)
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 24
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
>>>> send_member_notification: Sending membership update 24 to 2 children
>>>> Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice:
>>>> plugin_handle_membership: Membership 24: quorum acquired
>>>> Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
>>>> plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
>>>> member (was lost)
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
>>>> sender r(0) ip(10.14.80.190) r(1) ip(10.128.0.190) ; members(old:2
>>>> left:1)
>>>> Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed
>>>>service
>>>> synchronization, ready to provide service.
>>>> Apr  8 09:54:04 lotus-4vm6 stonith-ng[2492]:   notice: remote_op_done:
>>>> Operation reboot of lotus-4vm5 by lotus-4vm6 for
>>>> crmd.2496@lotus-4vm6.ae82b411<mailto:crmd.2496@lotus-4vm6.ae82b411>:
>>>>OK
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>>> tengine_stonith_callback: Stonith operation
>>>> 2/13:0:0:f325afae-64b0-4812-a897-70556ab1e806: OK (0)
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>>> tengine_stonith_notify: Peer lotus-4vm5 was terminated (reboot) by
>>>> lotus-4vm6 for lotus-4vm6: OK
>>>>(ref=ae82b411-b07a-4235-be55-5a30a00b323b)
>>>> by client crmd.2496
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
>>>>crm_update_peer_state:
>>>> send_stonith_update: Node lotus-4vm5[3176140298] - state is now lost
>>>> (was member)
>>>> Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>>> 0 (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0,
>>>> Source=/var/lib/pacemaker/pengine/pe-warn-25.bz2): Stopped
>>>> Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_local_callback:
>>>> Sending full refresh (origin=crmd)
>>>> Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_trigger_update:
>>>> Sending flush op to all hosts for: probe_complete (true)
>>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>>> loss of CCM Quorum: Ignore
>>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
>>>> st-fencing#011(lotus-4vm6)
>>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
>>>> MGS_607d26#011(lotus-4vm6)
>>>> Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice:
>>>>process_pe_message:
>>>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-912.bz2
>>>> Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>>> Initiating action 5: start st-fencing_start_0 on lotus-4vm6 (local)
>>>> Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>>> Initiating action 6: start MGS_607d26_start_0 on lotus-4vm6 (local)
>>>> Apr  8 09:54:05 lotus-4vm6 stonith-ng[2492]:   notice:
>>>> stonith_device_register: Device 'st-fencing' already existed in device
>>>> list (1 active devices)
>>>> Apr  8 09:54:05 lotus-4vm6 kernel: LDISKFS-fs warning (device sda):
>>>> ldiskfs_multi_mount_protect: MMP interval 42 higher than expected,
>>>> please wait.
>>>> Apr  8 09:54:05 lotus-4vm6 kernel:
>>>> Apr  8 09:54:10 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
>>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error
>>>> (-5)
>>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:   notice: process_lrm_event:
>>>>LRM
>>>> operation st-fencing_start_0 (call=24, rc=0, cib-update=89,
>>>> confirmed=true) ok
>>>> Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: crmd_cs_dispatch:
>>>> Recieving messages from a node we think is dead:
>>>>lotus-4vm5[-1118826998]
>>>> Apr  8 09:54:24 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:31 lotus-4vm6 crmd[2496]:   notice:
>>>> do_election_count_vote: Election 2 (current: 2, owner: lotus-4vm5):
>>>> Processed vote from lotus-4vm5 (Peer is not part of our cluster)
>>>> Apr  8 09:54:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:46 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): recovery complete
>>>> Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): mounted
>>>>filesystem
>>>> with ordered data mode. quota=on. Opts:
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [ [ ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [   { ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [     "args": [ ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [       "mount",  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [       "-t",  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [       "lustre",  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [
>>>> "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1",  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [       "/mnt/MGS" ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [     ],  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [     "rc": 0,  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [     "stderr": "",  ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [     "stdout": "" ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [   } ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [ ] ]
>>>> Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
>>>> MGS_607d26_start_0:3444:stderr [  ]
>>>> Apr  8 09:54:48 lotus-4vm6 crmd[2496]:   notice: process_lrm_event:
>>>>LRM
>>>> operation MGS_607d26_start_0 (call=26, rc=0, cib-update=94,
>>>> confirmed=true) ok
>>>> Apr  8 09:54:49 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>>> 1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
>>>> Source=/var/lib/pacemaker/pengine/pe-input-912.bz2): Stopped
>>>> Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_local_callback:
>>>> Sending full refresh (origin=crmd)
>>>> Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_trigger_update:
>>>> Sending flush op to all hosts for: probe_complete (true)
>>>> Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>>> loss of CCM Quorum: Ignore
>>>> Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice:
>>>>process_pe_message:
>>>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-913.bz2
>>>> Apr  8 09:54:50 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
>>>> Initiating action 9: monitor MGS_607d26_monitor_5000 on lotus-4vm6
>>>> (local)
>>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: process_lrm_event:
>>>>LRM
>>>> operation MGS_607d26_monitor_5000 (call=30, rc=0, cib-update=102,
>>>> confirmed=false) ok
>>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>>> 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>>> Source=/var/lib/pacemaker/pengine/pe-input-913.bz2): Complete
>>>> Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>> Apr  8 09:55:07 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:55:23 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: Evicted from MGS (at
>>>> 10.14.80.190@tcp) after server handle changed from 0x7acffb201664d0a4
>>>>to
>>>> 0x9a6b02eee57f3dba
>>>> Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: MGC10.14.80.189@tcp:
>>>> Connection restored to MGS (at 0@lo)
>>>> Apr  8 09:55:42 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:55:58 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:56:12 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:56:26 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:  warning: crmd_ha_msg_filter:
>>>> Another DC detected: lotus-4vm5 (op=join_offer)
>>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>>> State transition S_IDLE -> S_ELECTION [ input=I_ELECTION
>>>> cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
>>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>>> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>>> cause=C_FSA_INTERNAL origin=do_election_check ]
>>>> Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice:
>>>> do_election_count_vote: Election 3 (current: 3, owner: lotus-4vm6):
>>>> Processed no-vote from lotus-4vm5 (Peer is not part of our cluster)
>>>> Apr  8 09:56:36 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
>>>> 10.14.80.1 port 67 (xid=0x78d16782)
>>>> Apr  8 09:56:37 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
>>>> metadata found for fence_chroma::stonith:heartbeat: Input/output error
>>>> (-5)
>>>> Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_local_callback:
>>>> Sending full refresh (origin=crmd)
>>>> Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice:
>>>>attrd_trigger_update:
>>>> Sending flush op to all hosts for: probe_complete (true)
>>>> Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
>>>> loss of CCM Quorum: Ignore
>>>> Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice:
>>>>process_pe_message:
>>>> Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-914.bz2
>>>> Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
>>>> 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>>> Source=/var/lib/pacemaker/pengine/pe-input-914.bz2): Complete
>>>> Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
>>>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>>
>>>> Thank you very much
>>>> Gene
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.ca/w/
>What if the cure for cancer is trapped in the mind of a person without
>access to education?
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Node stuck in pending state

Reply via email to