Re: [ceph-users] nfs over rbd problem

Tyler Bishop Fri, 25 Dec 2015 18:08:47 -0800

I didn't read the whole thing but if your trying to do HA NFS, you need to run 
OCFS2 on your RBD and disable read/write caching on the rbd client.

From: "Steve Anthony" <sma...@lehigh.edu> 
To: ceph-users@lists.ceph.com 
Sent: Friday, December 25, 2015 12:39:01 AM 
Subject: Re: [ceph-users] nfs over rbd problem 

I've run into many problems trying to run RBD/NFS Pacemaker like you describe 
on a two node cluster. In my case, most of the problems were a result of a) no 
quorum and b) no STONITH. If you're going to be running this setup in 
production, I *highly* recommend adding more nodes (if only to maintain 
quorum). Without quorum, you will run into split-brain situations where you'll 
see anything from both nodes starting the same services to configuration 
changes disappearing when the nodes can't agree on a recent revision. 

In this case, it looks like a STONITH problem. Without a fencing mechanism, 
node2 cannot be 100% certain that node1 is dead. When you manually shutdown 
corosync, I suspect that as part of the process, node1 alerts the cluster that 
it's leaving as part of a planned shutdown. When it just disappears, node2 
can't determine what happened, it just knows node1 isn't answering anymore. 
Node1 is not necessarily down, there could just be a network problem between 
node1 and node2. In such a scenario, it would be very bad for node2 to 
map/mount the RBDs and start writing data while node1 is still providing the 
same service. Quorum can help here too, but for node2 to be certain node1 is 
down, it needs an out-of-band method to force power off node1, eg. I use IPMI 
on a separate physical network. 

So...add mode nodes as quorum members. You can set weight restrictions on the 
resource groups to prevent them from running on these nodes if the are not as 
powerful as the nodes you're using now. Then, add a STONITH mechanism on all 
the nodes, and verify it works. 

Once you do that, you should see things act the way you expect. Good luck! 

-Steve 

On 12/19/2015 03:46 AM, maoqi1982 wrote: 

Hi list 
I have a test ceph cluster include 3 nodes (node0: mon, node1: osd and nfs 
server1, node2 osd and nfs server2). 
os :centos6.6 ,kernel :3.10.94-1.el6.elrepo.x86_64, ceph version 0.94.5 
I followed the http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ 
instructions to setup an active/standy NFS environment. 
when using commands " # service corosync stop or # poweroff " on node1 , the 
fail over switch situation went fine ( nfs server take over by node2). But when 
I testing the situation of cutting off the power of node1, the switch is 
failed. 
1. [root@node1 ~]# crm status 
Last updated: Fri Dec 18 17:14:19 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node1 - partition with quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node1 node2 ] 
Resource Group: g_rbd_share_1 
p_rbd_map_1 (ocf::ceph:rbd.in): Started node1 
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node1 
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node1 
p_vip_1 (ocf::heartbeat:IPaddr): Started node1 
Clone Set: clo_nfs [g_nfs] 
Started: [ node1 node2 ] 
2. [root@node1 ~]# service corosync stop 
[root@node2 cluster]# crm status 
Last updated: Fri Dec 18 17:14:59 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node2 - partition WITHOUT quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node2 ] 
OFFLINE: [ node1 ] 
Resource Group: g_rbd_share_1 
p_rbd_map_1 (ocf::ceph:rbd.in): Started node2 
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node2 
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node2 
p_vip_1 (ocf::heartbeat:IPaddr): Started node2 
Clone Set: clo_nfs [g_nfs] 
Started: [ node2 ] 
Stopped: [ node1 ] 

3. cut off node1 power manually 
[root@node2 cluster]# crm status 
Last updated: Fri Dec 18 17:23:06 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node2 - partition WITHOUT quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node2 ] 
OFFLINE: [ node1 ] 
Clone Set: clo_nfs [g_nfs] 
Started: [ node2 ] 
Stopped: [ node1 ] 
Failed actions: 
p_rbd_map_1_start_0 on node2 'unknown error' (1): call=48, status=Timed Out, 
last-rc-change='Fri Dec 18 17:22:19 2015', queued=0ms, exec=20002ms 
corosync.log: 
Dec 18 17:22:19 corosync [pcmk ] notice: pcmk_peer_update: Transitional 
membership event on ring 668: memb=1, new=0, lost=1 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: memb: node2 1211279552 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: lost: node1 1194502336 
Dec 18 17:22:19 corosync [pcmk ] notice: pcmk_peer_update: Stable membership 
event on ring 668: memb=1, new=0, lost=0 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: MEMB: node2 1211279552 
Dec 18 17:22:19 corosync [pcmk ] info: ais_mark_unseen_peer_dead: Node node1 
was not seen in the previous transition 
Dec 18 17:22:19 corosync [pcmk ] info: update_member: Node 1194502336/node1 is 
now: lost 
Dec 18 17:22:19 [2695] node2 crmd: notice: plugin_handle_membership: Membership 
668: quorum lost 
Dec 18 17:22:19 [2690] node2 cib: notice: plugin_handle_membership: Membership 
668: quorum lost 
Dec 18 17:22:19 corosync [pcmk ] info: send_member_notification: Sending 
membership update 668 to 3 children 
Dec 18 17:22:19 [2691] node2 stonith-ng: notice: plugin_handle_membership: 
Membership 668: quorum lost 
Dec 18 17:22:19 [2695] node2 crmd: info: crm_update_peer_proc: 
plugin_handle_membership: Node node1[1194502336] - unknown is now lost 
Dec 18 17:22:19 [2690] node2 cib: info: crm_update_peer_proc: 
plugin_handle_membership: Node node1[1194502336] - unknown is now lost 
Dec 18 17:22:19 corosync [TOTEM ] A processor joined or left the membership and 
a new membership was formed. 
Dec 18 17:22:19 [2695] node2 crmd: info: peer_update_callback: Client 
node1/peer now has status [offline] (DC=true, changed=111312) 
Dec 18 17:22:19 [2691] node2 stonith-ng: info: crm_update_peer_proc: 
plugin_handle_membership: Node node1[1194502336] - unknown is now lost 
Dec 18 17:22:19 [2695] node2 crmd: info: peer_update_callback: Peer node1 left 
us 
Dec 18 17:22:19 [2690] node2 cib: notice: crm_update_peer_state: 
plugin_handle_membership: Node node1[1194502336] - state is now lost (was 
member) 
Dec 18 17:22:19 [2695] node2 crmd: warning: match_down_event: No match for 
shutdown action on node1 
Dec 18 17:22:19 [2691] node2 stonith-ng: notice: crm_update_peer_state: 
plugin_handle_membership: Node node1[1194502336] - state is now lost (was 
member) 
Dec 18 17:22:19 [2695] node2 crmd: notice: peer_update_callback: 
Stonith/shutdown of node1 not matched 
Dec 18 17:22:19 [2695] node2 crmd: info: crm_update_peer_join: 
peer_update_callback: Node node1[1194502336] - join-2 phase 4 -> 0 
Dec 18 17:22:19 [2695] node2 crmd: info: abort_transition_graph: Transition 
aborted: Node failure (source=peer_update_callback:237, 1) 
Dec 18 17:22:19 [2695] node2 crmd: notice: crm_update_peer_state: 
plugin_handle_membership: Node node1[1194502336] - state is now lost (was 
member) 
Dec 18 17:22:19 [2695] node2 crmd: info: peer_update_callback: node1 is now 
lost (was member) 
Dec 18 17:22:19 [2695] node2 crmd: warning: match_down_event: No match for 
shutdown action on node1 
Dec 18 17:22:19 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.50.72) 
; members(old:2 left:1) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/crmd/88) 
Dec 18 17:22:19 [2695] node2 crmd: notice: peer_update_callback: 
Stonith/shutdown of node1 not matched 
Dec 18 17:22:19 [2695] node2 crmd: info: abort_transition_graph: Transition 
aborted: Node failure (source=peer_update_callback:237, 1) 
Dec 18 17:22:19 corosync [MAIN ] Completed service synchronization, ready to 
provide service. 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/crmd/89) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.157 2 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.158 
(null) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=158 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node1']: @crmd=offline, 
@crm-debug-origin=peer_update_callback 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/crmd/88, 
version=0.69.158) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/crmd/89, 
version=0.69.158) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section nodes to master (origin=local/crmd/90) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/crmd/91) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section cib to master (origin=local/crmd/92) 
Dec 18 17:22:19 [2695] node2 crmd: info: crmd_cs_dispatch: Setting expected 
votes to 3 
Dec 18 17:22:19 [2695] node2 crmd: notice: do_state_transition: State 
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=check_join_state ] 
Dec 18 17:22:19 [2695] node2 crmd: info: do_dc_join_offer_one: An unknown node 
joined - (re-)offer to any unconfirmed nodes 
Dec 18 17:22:19 [2695] node2 crmd: info: join_make_offer: Making join offers 
based on membership 668 
Dec 18 17:22:19 [2695] node2 crmd: info: join_make_offer: Skipping node2: 
already known 4 
Dec 18 17:22:19 [2695] node2 crmd: info: join_make_offer: Not making an offer 
to node1: not active (lost) 
Dec 18 17:22:19 [2695] node2 crmd: info: abort_transition_graph: Transition 
aborted: Peer Halt (source=do_te_invoke:158, 1) 
Dec 18 17:22:19 [2695] node2 crmd: info: do_state_transition: State transition 
S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL 
origin=check_join_state ] 
Dec 18 17:22:19 [2695] node2 crmd: info: crmd_join_phase_log: join-2: 
node2=confirmed 
Dec 18 17:22:19 [2695] node2 crmd: info: crmd_join_phase_log: join-2: 
node1=none 
Dec 18 17:22:19 [2695] node2 crmd: info: do_state_transition: State transition 
S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
origin=peer_update_callback ] 
Dec 18 17:22:19 [2695] node2 crmd: info: do_dc_join_offer_one: An unknown node 
joined - (re-)offer to any unconfirmed nodes 
Dec 18 17:22:19 [2695] node2 crmd: info: join_make_offer: Skipping node2: 
already known 4 
Dec 18 17:22:19 [2695] node2 crmd: info: join_make_offer: Not making an offer 
to node1: not active (lost) 
Dec 18 17:22:19 [2695] node2 crmd: info: do_state_transition: State transition 
S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL 
origin=check_join_state ] 
Dec 18 17:22:19 [2695] node2 crmd: info: crmd_join_phase_log: join-2: 
node2=confirmed 
Dec 18 17:22:19 [2695] node2 crmd: info: crmd_join_phase_log: join-2: 
node1=none 
Dec 18 17:22:19 [2695] node2 crmd: info: do_state_transition: State transition 
S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL 
origin=check_join_state ] 
Dec 18 17:22:19 [2695] node2 crmd: info: abort_transition_graph: Transition 
aborted: Peer Cancelled (source=do_te_invoke:151, 1) 
Dec 18 17:22:19 [2693] node2 attrd: notice: attrd_local_callback: Sending full 
refresh (origin=crmd) 
Dec 18 17:22:19 [2693] node2 attrd: notice: attrd_trigger_update: Sending flush 
op to all hosts for: shutdown (0) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section nodes: OK (rc=0, origin=node2/crmd/90, 
version=0.69.158) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.158 2 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.159 
(null) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=159 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node2']: @crm-debug-origin=post_cache_update 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node1']: @in_ccm=false, 
@crm-debug-origin=post_cache_update 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/crmd/91, 
version=0.69.159) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.159 2 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.160 
(null) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=160, @have-quorum=0 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section cib: OK (rc=0, origin=node2/crmd/92, 
version=0.69.160) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section crm_config to master (origin=local/crmd/94) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section nodes to master (origin=local/crmd/95) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/crmd/96) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section cib to master (origin=local/crmd/97) 
Dec 18 17:22:19 [2693] node2 attrd: notice: attrd_trigger_update: Sending flush 
op to all hosts for: probe_complete (true) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section crm_config: OK (rc=0, origin=node2/crmd/94, 
version=0.69.160) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section nodes: OK (rc=0, origin=node2/crmd/95, 
version=0.69.160) 
Dec 18 17:22:19 [2694] node2 pengine: notice: unpack_config: On loss of CCM 
Quorum: Ignore 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.160 2 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.161 
(null) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=161 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node2']: @crm-debug-origin=do_state_transition 
Dec 18 17:22:19 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node1']: @crm-debug-origin=do_state_transition, 
@join=down 
Dec 18 17:22:19 [2694] node2 pengine: info: determine_online_status: Node node2 
is online 
Dec 18 17:22:19 [2694] node2 pengine: info: group_print: Resource Group: 
g_rbd_share_1 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/crmd/96, 
version=0.69.161) 
Dec 18 17:22:19 [2694] node2 pengine: info: native_print: p_rbd_map_1 
(ocf::ceph:rbd.in): Stopped 
Dec 18 17:22:19 [2694] node2 pengine: info: native_print: p_fs_rbd_1 
(ocf::heartbeat:Filesystem): Stopped 
Dec 18 17:22:19 [2694] node2 pengine: info: native_print: p_export_rbd_1 
(ocf::heartbeat:exportfs): Stopped 
Dec 18 17:22:19 [2694] node2 pengine: info: native_print: p_vip_1 
(ocf::heartbeat:IPaddr): Stopped 
Dec 18 17:22:19 [2694] node2 pengine: info: clone_print: Clone Set: clo_nfs 
[g_nfs] 
Dec 18 17:22:19 [2694] node2 pengine: info: short_print: Started: [ node2 ] 
Dec 18 17:22:19 [2694] node2 pengine: info: short_print: Stopped: [ node1 ] 
Dec 18 17:22:19 [2694] node2 pengine: info: rsc_merge_weights: p_rpcbind:1: 
Rolling back scores from p_nfs_server:1 
Dec 18 17:22:19 [2694] node2 pengine: info: native_color: Resource p_rpcbind:1 
cannot run anywhere 
Dec 18 17:22:19 [2694] node2 pengine: info: native_color: Resource 
p_nfs_server:1 cannot run anywhere 
Dec 18 17:22:19 [2694] node2 pengine: info: RecurringOp: Start recurring 
monitor (10s) for p_rbd_map_1 on node2 
Dec 18 17:22:19 [2694] node2 pengine: info: RecurringOp: Start recurring 
monitor (20s) for p_fs_rbd_1 on node2 
Dec 18 17:22:19 [2694] node2 pengine: info: RecurringOp: Start recurring 
monitor (10s) for p_export_rbd_1 on node2 
Dec 18 17:22:19 [2694] node2 pengine: info: RecurringOp: Start recurring 
monitor (5s) for p_vip_1 on node2 
Dec 18 17:22:19 [2694] node2 pengine: notice: LogActions: Start p_rbd_map_1 
(node2) 
Dec 18 17:22:19 [2694] node2 pengine: notice: LogActions: Start p_fs_rbd_1 
(node2) 
Dec 18 17:22:19 [2694] node2 pengine: notice: LogActions: Start p_export_rbd_1 
(node2) 
Dec 18 17:22:19 [2694] node2 pengine: notice: LogActions: Start p_vip_1 (node2) 
Dec 18 17:22:19 [2694] node2 pengine: info: LogActions: Leave p_rpcbind:0 
(Started node2) 
Dec 18 17:22:19 [2694] node2 pengine: info: LogActions: Leave p_nfs_server:0 
(Started node2) 
Dec 18 17:22:19 [2694] node2 pengine: info: LogActions: Leave p_rpcbind:1 
(Stopped) 
Dec 18 17:22:19 [2694] node2 pengine: info: LogActions: Leave p_nfs_server:1 
(Stopped) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section cib: OK (rc=0, origin=node2/crmd/97, 
version=0.69.161) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/attrd/24) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/attrd/24, 
version=0.69.161) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/attrd/26) 
Dec 18 17:22:19 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/attrd/26, 
version=0.69.161) 
Dec 18 17:22:19 [2694] node2 pengine: notice: process_pe_message: Calculated 
Transition 3: /var/lib/pacemaker/pengine/pe-input-234.bz2 
Dec 18 17:22:19 [2695] node2 crmd: info: do_state_transition: State transition 
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
origin=handle_response ] 
Dec 18 17:22:19 [2695] node2 crmd: info: do_te_invoke: Processing graph 3 
(ref=pe_calc-dc-1450430539-52) derived from 
/var/lib/pacemaker/pengine/pe-input-234.bz2 
Dec 18 17:22:19 [2695] node2 crmd: notice: te_rsc_command: Initiating action 6: 
start p_rbd_map_1_start_0 on node2 (local) 
Dec 18 17:22:19 [2695] node2 crmd: info: do_lrm_rsc_op: Performing 
key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9 op=p_rbd_map_1_start_0 
Dec 18 17:22:19 [2692] node2 lrmd: info: log_execute: executing - 
rsc:p_rbd_map_1 action:start call_id:48 
Dec 18 17:22:24 [2690] node2 cib: info: cib_process_ping: Reporting our current 
digest to node2: 29875b5ae8dcbbb1d07f904ca3423de2 for 0.69.161 (0x12607b0 0) 
Dec 18 17:22:39 [2692] node2 lrmd: warning: child_timeout_callback: 
p_rbd_map_1_start_0 process (PID 11010) timed out 
Dec 18 17:22:39 [2692] node2 lrmd: warning: operation_finished: 
p_rbd_map_1_start_0:11010 - timed out after 20000ms 
Dec 18 17:22:39 [2692] node2 lrmd: notice: operation_finished: 
p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME 
environment variable not set. Disabling LTTng-UST per-user tracing. (in 
setup_local_apps() at lttng-ust-comm.c:305) ] 
Dec 18 17:22:39 [2692] node2 lrmd: info: log_finished: finished - 
rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms 
queue-time:0ms 
Dec 18 17:22:39 [2695] node2 crmd: info: services_os_action_execute: Managed 
rbd.in_meta-data_0 process 11117 exited with rc=0 
Dec 18 17:22:39 [2695] node2 crmd: error: process_lrm_event: Operation 
p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms) 

Dec 18 17:22:39 [2692] node2 lrmd: warning: child_timeout_callback: 
p_rbd_map_1_start_0 process (PID 11010) timed out 
Dec 18 17:22:39 [2692] node2 lrmd: warning: operation_finished: 
p_rbd_map_1_start_0:11010 - timed out after 20000ms 
Dec 18 17:22:39 [2692] node2 lrmd: notice: operation_finished: 
p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME 
environment variable not set. Disabling LTTng-UST per-user tracing. (in 
setup_local_apps() at lttng-ust-comm.c:305) ] 
Dec 18 17:22:39 [2692] node2 lrmd: info: log_finished: finished - 
rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms 
queue-time:0ms 
Dec 18 17:22:39 [2695] node2 crmd: info: services_os_action_execute: Managed 
rbd.in_meta-data_0 process 11117 exited with rc=0 
Dec 18 17:22:39 [2695] node2 crmd: error: process_lrm_event: Operation 
p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms) 
Dec 18 17:22:39 [2695] node2 crmd: notice: process_lrm_event: 
node2-p_rbd_map_1_start_0:48 [ libust[11019/11019]: Warning: HOME environment 
variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() 
at lttng-ust-comm.c:305)\n ] 
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/crmd/99) 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.161 2 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.162 
(null) 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=162 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node2']: @crm-debug-origin=do_update_resource 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + 
/cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']/lrm_rsc_op[@id='p_rbd_map_1_last_0']:
 @operation_key=p_rbd_map_1_start_0, @operation=start, 
@transition-key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, 
@transition-magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @call-id=48, 
@rc-code=1, @op-status=2, @last-run=1450430539, @last-rc-change=1450430539, 
@exec-time=20002 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']:
 <lrm_rsc_op id="p_rbd_map_1_last_failure_0" 
operation_key="p_rbd_map_1_start_0" operation="start" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" 
transition-key="6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" 
transition-magic="2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" call-id="48" 
rc-code="1" op-status="2" interval="0" l 
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request: Completed 
cib_modify operation for section status: OK (rc=0, origin=node2/crmd/99, 
version=0.69.162) 
Dec 18 17:22:39 [2695] node2 crmd: warning: status_from_rc: Action 6 
(p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error 
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount: Updating 
failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, 
time=1450430559) 
Dec 18 17:22:39 [2695] node2 crmd: notice: abort_transition_graph: Transition 
aborted by p_rbd_map_1_start_0 'modify' on node2: Event failed 
(magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, 
source=match_graph_event:344, 0) 
Dec 18 17:22:39 [2695] node2 crmd: info: match_graph_event: Action 
p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4) 
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_trigger_update: Sending flush 
op to all hosts for: fail-count-p_rbd_map_1 (INFINITY) 
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount: Updating 
failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, 
time=1450430559) 
Dec 18 17:22:39 [2695] node2 crmd: info: process_graph_event: Detected action 
(3.6) p_rbd_map_1_start_0.48=unknown error: failed 
Dec 18 17:22:39 [2695] node2 crmd: warning: status_from_rc: Action 6 
(p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error 
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount: Updating 
failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, 
time=1450430559) 
Dec 18 17:22:39 [2695] node2 crmd: info: abort_transition_graph: Transition 
aborted by p_rbd_map_1_start_0 'create' on (null): Event failed 
(magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162, 
source=match_graph_event:344, 0) 
Dec 18 17:22:39 [2695] node2 crmd: info: match_graph_event: Action 
p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4) 
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount: Updating 
failcount for p_rbd_map_1 on node2 after failed start: rc=1 (update=INFINITY, 
time=1450430559) 
Dec 18 17:22:39 [2695] node2 crmd: info: process_graph_event: Detected action 
(3.6) p_rbd_map_1_start_0.48=unknown error: failed 
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_perform_update: Sent update 
28: fail-count-p_rbd_map_1=INFINITY 
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request: Forwarding 
cib_modify operation for section status to master (origin=local/attrd/28) 
Dec 18 17:22:39 [2695] node2 crmd: notice: run_graph: Transition 3 (Complete=2, 
Pending=0, Fired=0, Skipped=8, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-234.bz2): Stopped 
Dec 18 17:22:39 [2695] node2 crmd: info: do_state_transition: State transition 
S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=notify_crmd ] 
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_trigger_update: Sending flush 
op to all hosts for: last-failure-p_rbd_map_1 (1450430559) 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff: --- 0.69.162 2 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff: +++ 0.69.163 
(null) 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + /cib: 
@num_updates=163 
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: ++ 
/cib/status/node_state[@id='node2']/transient_attributes[@id='node2']/instance_attributes[@id='status-node2']:
 <nvpair id="status-node2-fail-count-p_rbd_map_1" name="fail-count-p_rbd_map_1" 
value="INFINITY"/> 
......... 

thanks 

_______________________________________________
ceph-users mailing list ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University sma...@lehigh.edu 

_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] nfs over rbd problem

Reply via email to