Re: [Pacemaker] Node stuck in pending state

Digimer Wed, 09 Apr 2014 10:39:07 -0700

When a node enters an unknown state (from the perspective of the rest ofthe cluster), it is extremely unsafe to assume what state it is in. Theonly safe option is to block and call a fence to put the lost node intoa known state. Only when the fence action confirms that the lost nodewas successfully isolated (rebooted, usually) is it safe for the clusterto proceed with recovery.

A properly configured cluster will react to a failed fence by blocking.An improperly configured cluster will make assumptions and enter anundefined state where it's hard to predict what will happen next, butoften it's "not good".


Take a minute to read this please:

https://alteeve.ca/w/AN!Cluster_Tutorial_2#Concept.3B_Fencing

It's about cman + rgmanager, but the concepts port 1:1 to pacemaker.

The best analogy I can think of for fencing is to compare it toseatbelts in cars. You don't appreciate their importance when you'venever had an accident, so often people leave them unbuckled. When youcrash though, the seatbelt can make all the difference in the world.Fencing is like that. I often hear people say "I've been in productionfor over a year without fencing and it was fine!". Of course, theydidn't crash in that time, so they didn't need fencing before then.


digimer

On 09/04/14 12:10 PM, Campbell, Gene wrote:

Thanks for the response.  I hope you don¹t mind a couple questions along
the way to understanding this issue.

We have storage attached to vm5
Power is cut to vm5
Failover to vm6 happens and storage is made available there
vm5 reboots

Can you tell Where fencing is happening in this picture?  Will keep
reading docs, and looking at logs, but anything think you do to help would
be much appreciated.

Thanks
Gene



On 4/8/14, 2:29 PM, "Digimer" <[email protected]> wrote:

Looks like your fencing (stonith) failed.

On 08/04/14 05:25 PM, Campbell, Gene wrote:

Hello fine folks in Pacemaker land.   Hopefully you could share your
insight into this little problem for us.

We have a intermittent problem with failover.

two node cluster
first node power is cut
failover begins to second node
first node reboots
crm_mon -1 on the rebooted node is  PENDING (never goes to ONLINE)

Example output from vm5
Node lotus-4vm5: pending
Online: [ lotus-4vm6 ]

Example output from vm6
Online: [ lotus-4vm5  lotus-4vm6 ]

Environment
Centos 6.5 on KVM vms
Pacemaker 1.1.10
Corosync 1.4.1

vm5 /var/log/messages
Apr  8 09:54:07 lotus-4vm5 pacemaker: Starting Pacemaker Cluster Manager
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: main: Starting
Pacemaker 1.1.10-14.el6_5.2 (Build: 368c726):  generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios  corosync-plugin cman
Apr  8 09:54:07 lotus-4vm5 pacemakerd[1783]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:07 lotus-4vm5 crmd[1794]:   notice: main: CRM Git Version:
368c726
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:07 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20b6280 for attrd/0
Apr  8 09:54:07 lotus-4vm5 attrd[1792]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:07 lotus-4vm5 stonith-ng[1790]:   notice:
crm_cluster_connect: Connecting to cluster infrastructure: classic
openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Apr  8 09:54:08 lotus-4vm5 attrd[1792]:   notice: main: Starting
mainloop...
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20ba600 for stonith-ng/0
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20be980 for cib/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Sending membership update 24 to cib
Apr  8 09:54:08 lotus-4vm5 stonith-ng[1790]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice:
plugin_handle_membership: Membership 24: quorum acquired
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was (null))
Apr  8 09:54:08 lotus-4vm5 cib[1789]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
member (was (null))
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Recorded connection 0x20c2d00 for crmd/0
Apr  8 09:54:08 lotus-4vm5 corosync[1364]:   [pcmk  ] info: pcmk_ipc:
Sending membership update 24 to crmd
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node
name
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice:
plugin_handle_membership: Membership 24: quorum acquired
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was (null))
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm6[3192917514] - state is now
member (was (null))
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_started: The local
CRM is operational
Apr  8 09:54:08 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
State transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: setup_cib:
Watching for stonith topology changes
Apr  8 09:54:09 lotus-4vm5 stonith-ng[1790]:   notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr  8 09:54:10 lotus-4vm5 stonith-ng[1790]:   notice:
stonith_device_register: Added 'st-fencing' to the device list (1 active
devices)
Apr  8 09:54:10 lotus-4vm5 cib[1789]:   notice:
cib_server_process_diff: Not applying diff 0.31.21 -> 0.31.22 (sync in
progress)
Apr  8 09:54:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Apr  8 09:56:29 lotus-4vm5 crmd[1794]:    error: crm_timer_popped:
Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION!
(120000ms)
Apr  8 09:56:29 lotus-4vm5 crmd[1794]:   notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: do_log: FSA: Input
I_RELEASE_DC from do_election_count_vote() received in state
S_INTEGRATION
Apr  8 09:56:29 lotus-4vm5 crmd[1794]:  warning: join_query_callback:
No DC for join-1


vm6 /var/log/messages
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 16: memb=1,
new=0, lost=0
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: memb: lotus-4vm6 3192917514
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
lost=0
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
update_member: Node 3176140298/lotus-4vm5 is now: member
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: NEW:  lotus-4vm5 3176140298
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: MEMB: lotus-4vm5 3176140298
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: MEMB: lotus-4vm6 3192917514
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
send_member_notification: Sending membership update 16 to 2 children
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
joined or left the membership and a new membership was formed.
Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice:
plugin_handle_membership: Membership 16: quorum acquired
Apr  8 09:52:51 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was lost)
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 16
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
send_member_notification: Sending membership update 16 to 2 children
Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice:
plugin_handle_membership: Membership 16: quorum acquired
Apr  8 09:52:51 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was lost)
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
sender r(0) ip(10.14.80.189) r(1) ip(10.128.0.189) ; members(old:1
left:0)
Apr  8 09:52:51 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Apr  8 09:52:57 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:53:14 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:53:15 lotus-4vm6 stonith-ng[2492]:  warning: parse_host_line:
Could not parse (38 47): "console"
Apr  8 09:53:20 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
failed, forming new configuration.
Apr  8 09:53:21 lotus-4vm6 stonith-ng[2492]:   notice: log_operation:
Operation 'reboot' [3306] (call 2 from crmd.2496) for host 'lotus-4vm5'
with device 'st-fencing' returned: 0 (OK)
Apr  8 09:53:21 lotus-4vm6 crmd[2496]:   notice: erase_xpath_callback:
Deletion of "//node_state[@uname='lotus-4vm5']/lrm": Timer expired
(rc=-62)
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 20: memb=1,
new=0, lost=1
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: memb: lotus-4vm6 3192917514
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: lost: lotus-4vm5 3176140298
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0,
lost=0
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: MEMB: lotus-4vm6 3192917514
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
ais_mark_unseen_peer_dead: Node lotus-4vm5 was not seen in the previous
transition
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
update_member: Node 3176140298/lotus-4vm5 is now: lost
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
send_member_notification: Sending membership update 20 to 2 children
Apr  8 09:53:26 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
joined or left the membership and a new membership was formed.
Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice:
plugin_handle_membership: Membership 20: quorum lost
Apr  8 09:53:26 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
lost (was member)
Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice:
plugin_handle_membership: Membership 20: quorum lost
Apr  8 09:53:26 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
lost (was member)
Apr  8 09:53:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:53:43 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:01 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Transitional membership event on ring 24: memb=1,
new=0, lost=0
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: memb: lotus-4vm6 3192917514
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] notice:
pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
lost=0
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
update_member: Node 3176140298/lotus-4vm5 is now: member
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: NEW:  lotus-4vm5 3176140298
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: MEMB: lotus-4vm5 3176140298
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
pcmk_peer_update: MEMB: lotus-4vm6 3192917514
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
send_member_notification: Sending membership update 24 to 2 children
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [TOTEM ] A processor
joined or left the membership and a new membership was formed.
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
plugin_handle_membership: Membership 24: quorum acquired
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was lost)
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
update_member: 0x1284140 Node 3176140298 (lotus-4vm5) born on: 24
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [pcmk  ] info:
send_member_notification: Sending membership update 24 to 2 children
Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice:
plugin_handle_membership: Membership 24: quorum acquired
Apr  8 09:54:04 lotus-4vm6 cib[2491]:   notice: crm_update_peer_state:
plugin_handle_membership: Node lotus-4vm5[3176140298] - state is now
member (was lost)
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [CPG   ] chosen downlist:
sender r(0) ip(10.14.80.190) r(1) ip(10.128.0.190) ; members(old:2
left:1)
Apr  8 09:54:04 lotus-4vm6 corosync[2442]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Apr  8 09:54:04 lotus-4vm6 stonith-ng[2492]:   notice: remote_op_done:
Operation reboot of lotus-4vm5 by lotus-4vm6 for
[email protected]<mailto:[email protected]>: OK
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
tengine_stonith_callback: Stonith operation
2/13:0:0:f325afae-64b0-4812-a897-70556ab1e806: OK (0)
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice:
tengine_stonith_notify: Peer lotus-4vm5 was terminated (reboot) by
lotus-4vm6 for lotus-4vm6: OK (ref=ae82b411-b07a-4235-be55-5a30a00b323b)
by client crmd.2496
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: crm_update_peer_state:
send_stonith_update: Node lotus-4vm5[3176140298] - state is now lost
(was member)
Apr  8 09:54:04 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
0 (Complete=1, Pending=0, Fired=0, Skipped=7, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-25.bz2): Stopped
Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Apr  8 09:54:04 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
st-fencing#011(lotus-4vm6)
Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: LogActions: Start
MGS_607d26#011(lotus-4vm6)
Apr  8 09:54:05 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-912.bz2
Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
Initiating action 5: start st-fencing_start_0 on lotus-4vm6 (local)
Apr  8 09:54:05 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
Initiating action 6: start MGS_607d26_start_0 on lotus-4vm6 (local)
Apr  8 09:54:05 lotus-4vm6 stonith-ng[2492]:   notice:
stonith_device_register: Device 'st-fencing' already existed in device
list (1 active devices)
Apr  8 09:54:05 lotus-4vm6 kernel: LDISKFS-fs warning (device sda):
ldiskfs_multi_mount_protect: MMP interval 42 higher than expected,
please wait.
Apr  8 09:54:05 lotus-4vm6 kernel:
Apr  8 09:54:10 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
metadata found for fence_chroma::stonith:heartbeat: Input/output error
(-5)
Apr  8 09:54:11 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
operation st-fencing_start_0 (call=24, rc=0, cib-update=89,
confirmed=true) ok
Apr  8 09:54:11 lotus-4vm6 crmd[2496]:  warning: crmd_cs_dispatch:
Recieving messages from a node we think is dead: lotus-4vm5[-1118826998]
Apr  8 09:54:24 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:31 lotus-4vm6 crmd[2496]:   notice:
do_election_count_vote: Election 2 (current: 2, owner: lotus-4vm5):
Processed vote from lotus-4vm5 (Peer is not part of our cluster)
Apr  8 09:54:34 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:46 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): recovery complete
Apr  8 09:54:48 lotus-4vm6 kernel: LDISKFS-fs (sda): mounted filesystem
with ordered data mode. quota=on. Opts:
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [ [ ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [   { ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [     "args": [ ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [       "mount",  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [       "-t",  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [       "lustre",  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [
"/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1",  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [       "/mnt/MGS" ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [     ],  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [     "rc": 0,  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [     "stderr": "",  ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [     "stdout": "" ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [   } ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [ ] ]
Apr  8 09:54:48 lotus-4vm6 lrmd[2493]:   notice: operation_finished:
MGS_607d26_start_0:3444:stderr [  ]
Apr  8 09:54:48 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
operation MGS_607d26_start_0 (call=26, rc=0, cib-update=94,
confirmed=true) ok
Apr  8 09:54:49 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-912.bz2): Stopped
Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Apr  8 09:54:49 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Apr  8 09:54:50 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-913.bz2
Apr  8 09:54:50 lotus-4vm6 crmd[2496]:   notice: te_rsc_command:
Initiating action 9: monitor MGS_607d26_monitor_5000 on lotus-4vm6
(local)
Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: process_lrm_event: LRM
operation MGS_607d26_monitor_5000 (call=30, rc=0, cib-update=102,
confirmed=false) ok
Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-913.bz2): Complete
Apr  8 09:54:51 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr  8 09:55:07 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:55:23 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: Evicted from MGS (at
10.14.80.190@tcp) after server handle changed from 0x7acffb201664d0a4 to
0x9a6b02eee57f3dba
Apr  8 09:55:38 lotus-4vm6 kernel: Lustre: MGC10.14.80.189@tcp:
Connection restored to MGS (at 0@lo)
Apr  8 09:55:42 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:55:58 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:56:12 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:56:26 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:56:31 lotus-4vm6 crmd[2496]:  warning: crmd_ha_msg_filter:
Another DC detected: lotus-4vm5 (op=join_offer)
Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
State transition S_IDLE -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Apr  8 09:56:31 lotus-4vm6 crmd[2496]:   notice:
do_election_count_vote: Election 3 (current: 3, owner: lotus-4vm6):
Processed no-vote from lotus-4vm5 (Peer is not part of our cluster)
Apr  8 09:56:36 lotus-4vm6 dhclient[1012]: DHCPREQUEST on eth0 to
10.14.80.1 port 67 (xid=0x78d16782)
Apr  8 09:56:37 lotus-4vm6 crmd[2496]:  warning: get_rsc_metadata: No
metadata found for fence_chroma::stonith:heartbeat: Input/output error
(-5)
Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Apr  8 09:56:37 lotus-4vm6 attrd[2494]:   notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
Apr  8 09:56:38 lotus-4vm6 pengine[2495]:   notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-914.bz2
Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: run_graph: Transition
3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-914.bz2): Complete
Apr  8 09:56:38 lotus-4vm6 crmd[2496]:   notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

Thank you very much
Gene


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Node stuck in pending state

Reply via email to