Re: [ceph-users] 答复: Pipe "deadlock" in Hammer, 0.94.5

jiajia zhong Thu, 12 Jan 2017 19:51:06 -0800

Yes, but that depends.

that might have changed on master barnch.


2017-01-13 10:47 GMT+08:00 许雪寒 <xuxue...@360.cn>:

> Thanks for your reply☺
>
> Indeed, Pipe::do_recv would act just as blocked when errno is EAGAIN,
> however, in Pipe::read_message method, it first checks if there is pending
> msg on the socket by “Pipe::tcp_read_wait”. So, I think, when Pipe::do_recv
> is called, it shouldn’t get an EAGAIN, which means it wouldn’t act as
> blocked. Is this so?
> This really confuses me.
>
>
> 发件人: jiajia zhong [mailto:zhong2p...@gmail.com]
> 发送时间: 2017年1月12日 18:22
> 收件人: 许雪寒
> 抄送: ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] Pipe "deadlock" in Hammer, 0.94.5
>
> if errno is EAGAIN for recv, the Pipe:do_recv just acts as blocked. so
>
> 2017-01-12 16:34 GMT+08:00 许雪寒 <xuxue...@360.cn>:
> Hi, everyone.
>
> Recently, we did some experiment to test the stability of the ceph
> cluster. We used Hammer version which is the mostly used version of online
> cluster. One of the scenarios that we simulated is poor network
> connectivity, in which we used iptables to drop TCP/IP packet under some
> probability. And sometimes, we can see the following phenomenon: one
> machine is running iptables to drop packets going in and out, OSDs on other
> machines could be brought down, and sometimes more than one OSD.
>
> We used gdb to debug the core dumped by linux. We found that the thread
> that hit the suicide time threshold is a peering thread who is trying to
> send a pg_notify message, the ceph-osd log file and gdb output is as
> follows:
>
> Log file:
>     -3> 2017-01-10 17:02:13.469949 7fd446ff7700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had timed out after 15
>     -2> 2017-01-10 17:02:13.469952 7fd446ff7700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had suicide timed out after
> 150
>     -1> 2017-01-10 17:02:13.469954 7fd4451f4700  1 --
> 10.160.132.157:6818/10014122 <== osd.20 10.160.132.156:0/24908 163 ====
> osd_ping(ping e4030 stamp 2017-01-10 17:02:13.450374) v2 ==== 47+0+0
> (3247646131 0 0) 0x7fd418ca8600 con 0x7fd413c89700
>      0> 2017-01-10 17:02:13.496895 7fd446ff7700 -1 error_msg
> common/HeartbeatMap.cc: In function 'bool 
> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7fd446ff7700 time 2017-01-10 17:02:13.469969
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
> GDB OUTPUT:
> (gdb) thread 8
> [Switching to thread 8 (Thread 0x7fd440bed700 (LWP 15302))]#0
> 0x0000003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x0000003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x0000003c5d8095d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x0000003c5d8094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000001a54ae4 in Mutex::Lock (this=0x7fd426453598,
> no_lockdep=false) at common/Mutex.cc:96
> #4  0x0000000001409285 in Mutex::Locker::Locker (this=0x7fd440beb6c0,
> m=...) at common/Mutex.h:115
> #5  0x0000000001c46446 in PipeConnection::try_get_pipe
> (this=0x7fd426453580, p=0x7fd440beb908) at msg/simple/PipeConnection.cc:38
> #6  0x0000000001c05809 in SimpleMessenger::submit_message
> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580, dest_addr=...,
> dest_type=4, already_locked=false) at msg/simple/SimpleMessenger.cc:443
> #7  0x0000000001c033fa in SimpleMessenger::_send_message
> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at
> msg/simple/SimpleMessenger.cc:136
> #8  0x0000000001c467c7 in SimpleMessenger::send_message
> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at
> msg/simple/SimpleMessenger.h:139
> #9  0x0000000001c466a1 in PipeConnection::send_message
> (this=0x7fd426453580, m=0x7fd425538d00) at msg/simple/PipeConnection.cc:78
> #10 0x00000000013b3ff2 in OSDService::send_map (this=0x7fd4821e76c8,
> m=0x7fd425538d00, con=0x7fd426453580) at osd/OSD.cc:1054
> #11 0x00000000013b45e7 in OSDService::send_incremental_map
> (this=0x7fd4821e76c8, since=4028, con=0x7fd426453580,
> osdmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480) at osd/OSD.cc:1087
> #12 0x00000000013b215f in OSDService::share_map_peer (this=0x7fd4821e76c8,
> peer=9, con=0x7fd426453580, map=std::tr1::shared_ptr (count 49)
> 0x7fd426c0f480) at osd/OSD.cc:887
> #13 0x00000000013f43cc in OSD::do_notifies (this=0x7fd4821e6000,
> notify_list=std::map with 7 elements = {...}, curmap=std::tr1::shared_ptr
> (count 49) 0x7fd426c0f480) at osd/OSD.cc:7246
> #14 0x00000000013f3c99 in OSD::dispatch_context (this=0x7fd4821e6000,
> ctx=..., pg=0x0, curmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480,
> handle=0x7fd440becb40) at osd/OSD.cc:7198
> #15 0x000000000140043e in OSD::process_peering_events
> (this=0x7fd4821e6000, pgs=std::list = {...}, handle=...) at osd/OSD.cc:8539
> #16 0x000000000141e094 in OSD::PeeringWQ::_process (this=0x7fd4821e7070,
> pgs=std::list = {...}, handle=...) at osd/OSD.h:1601
> #17 0x00000000014b94bf in ThreadPool::BatchWorkQueue<PG>::_void_process
> (this=0x7fd4821e7070, p=0x7fd425419040, handle=...) at
> common/WorkQueue.h:107
> #18 0x0000000001b2d2e8 in ThreadPool::worker (this=0x7fd4821e64b0,
> wt=0x7fd4761db430) at common/WorkQueue.cc:128
> #19 0x0000000001b313f7 in ThreadPool::WorkThread::entry
> (this=0x7fd4761db430) at common/WorkQueue.h:318
> #20 0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd4761db430) at
> common/Thread.cc:61
> #21 0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd4761db430) at
> common/Thread.cc:45
> #22 0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0
> #23 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6
>
> As is shown, the thread is waiting for a mutex lock which we believe is
> Connection::lock . And the thread that is holding this mutex lock which the
> waiting thread is trying to get is a pipe reader_thread who is trying to
> read a full message that sent from another OSD:
>
> (gdb) frame 2
> #2  0x0000003c5d8094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> (gdb) info reg
> rax            0xfffffffffffffe00            -512 rbx
> 0x7fd425538d00             140549136026880 rcx
> 0xffffffffffffffff               -1 rdx            0x10    16
> rsi            0x80      128 rdi            0x7fd4264535a8
> 140549151864232 rbp            0x7fd440beb680            0x7fd440beb680
> rsp            0x7fd440beb648            0x7fd440beb648
> r8             0x7fd4264535a8               140549151864232
> r9             0x3bc6 15302
> r10            0x1       1
> r11            0x246  582
> r12            0x4       4
> r13            0x7fd440bed9c0             140549596043712
> r14            0x0       0
> r15            0x3       3
> rip            0x3c5d8094a7   0x3c5d8094a7 <pthread_mutex_lock+55>
> eflags         0x246               [ PF ZF IF ] cs             0x33      51
> ss             0x2b     43 ds             0x0       0 es
> 0x0       0 fs             0x0        0 gs             0x0        0
> (gdb) p *(pthread_mutex_t*)0x7fd4264535a8
> $4 = {__data = {__lock = 2, __count = 0, __owner = 5008, __nusers = 1,
> __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size =
> "\002\000\000\000\000\000\000\000\220\023\000\000\001\000\000\000\002",
> '\000' <repeats 22 times>, __align = 2}
> (gdb) thread 50
> [Switching to thread 50 (Thread 0x7fd4001f1700 (LWP 5008))]#0
> 0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0
> #1  0x0000000001c3a6f8 in Pipe::do_recv (this=0x7fd426561000,
> buf=0x7fd42649c000 "\a\220\006", len=4096, flags=64) at
> msg/simple/Pipe.cc:2428
> #2  0x0000000001c3aa4d in Pipe::buffered_recv (this=0x7fd426561000,
> buf=0x7fd4224c01d3 "", len=4096, flags=64) at msg/simple/Pipe.cc:2474
> #3  0x0000000001c3ab0d in Pipe::tcp_read_nonblocking (this=0x7fd426561000,
> buf=0x7fd4224c0000 "\t\to\030", len=4096) at msg/simple/Pipe.cc:2492
> #4  0x0000000001c37a25 in Pipe::read_message (this=0x7fd426561000,
> pm=0x7fd4001f0b80, auth_handler=0x7fd422481080) at msg/simple/Pipe.cc:2032
> #5  0x0000000001c33146 in Pipe::reader (this=0x7fd426561000) at
> msg/simple/Pipe.cc:1581
> #6  0x0000000001c3b0c8 in Pipe::Reader::entry (this=0x7fd426561018) at
> msg/simple/Pipe.h:50
> #7  0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd426561018) at
> common/Thread.cc:61
> #8  0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd426561018) at
> common/Thread.cc:45
> #9  0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0
> #10 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6
>
> Because we were dropping a large amount of ip packets, so, at first, we
> thought the “recv” call is blocked for a long time, which led to the
> suicide timeout. Later, we found that the invocation of "recv" is passing a
> MSG_DONTWAIT flag which makes it act in a non-blocking way and wouldn't be
> blocked, however, every time we encounter this problem, it's the
> Pipe::reader_thread who's doing "recv" that holds the Connection::lock.
> This really confuses us.
>
> On the other hand, after reading the source code, we found that in the
> submit_message method, the waiting thread is trying to get a pipe for which
> it must get the Connection::lock. However, what the submit_message is
> trying to do is just put the message in the pipe's out_q, it seems not
> necessary for it to be synchronized with the pipe's reader_thread. So we
> guess this might be a bug. Is this right? Or Am I misunderstanding
> something?
>
> Because the test shows that when one machine is suffering a heavy packet
> loss, OSDs on other machines could be brought down, which means it might
> cause the whole cluster to be malfunctioning, we are really concerned.
>
> Please help us, thank you☺
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 答复: Pipe "deadlock" in Hammer, 0.94.5

Reply via email to