Yes, but that depends. that might have changed on master barnch.
2017-01-13 10:47 GMT+08:00 许雪寒 <xuxue...@360.cn>: > Thanks for your reply☺ > > Indeed, Pipe::do_recv would act just as blocked when errno is EAGAIN, > however, in Pipe::read_message method, it first checks if there is pending > msg on the socket by “Pipe::tcp_read_wait”. So, I think, when Pipe::do_recv > is called, it shouldn’t get an EAGAIN, which means it wouldn’t act as > blocked. Is this so? > This really confuses me. > > > 发件人: jiajia zhong [mailto:zhong2p...@gmail.com] > 发送时间: 2017年1月12日 18:22 > 收件人: 许雪寒 > 抄送: ceph-users@lists.ceph.com > 主题: Re: [ceph-users] Pipe "deadlock" in Hammer, 0.94.5 > > if errno is EAGAIN for recv, the Pipe:do_recv just acts as blocked. so > > 2017-01-12 16:34 GMT+08:00 许雪寒 <xuxue...@360.cn>: > Hi, everyone. > > Recently, we did some experiment to test the stability of the ceph > cluster. We used Hammer version which is the mostly used version of online > cluster. One of the scenarios that we simulated is poor network > connectivity, in which we used iptables to drop TCP/IP packet under some > probability. And sometimes, we can see the following phenomenon: one > machine is running iptables to drop packets going in and out, OSDs on other > machines could be brought down, and sometimes more than one OSD. > > We used gdb to debug the core dumped by linux. We found that the thread > that hit the suicide time threshold is a peering thread who is trying to > send a pg_notify message, the ceph-osd log file and gdb output is as > follows: > > Log file: > -3> 2017-01-10 17:02:13.469949 7fd446ff7700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had timed out after 15 > -2> 2017-01-10 17:02:13.469952 7fd446ff7700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had suicide timed out after > 150 > -1> 2017-01-10 17:02:13.469954 7fd4451f4700 1 -- > 10.160.132.157:6818/10014122 <== osd.20 10.160.132.156:0/24908 163 ==== > osd_ping(ping e4030 stamp 2017-01-10 17:02:13.450374) v2 ==== 47+0+0 > (3247646131 0 0) 0x7fd418ca8600 con 0x7fd413c89700 > 0> 2017-01-10 17:02:13.496895 7fd446ff7700 -1 error_msg > common/HeartbeatMap.cc: In function 'bool > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, > const char*, time_t)' thread 7fd446ff7700 time 2017-01-10 17:02:13.469969 > common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") > > GDB OUTPUT: > (gdb) thread 8 > [Switching to thread 8 (Thread 0x7fd440bed700 (LWP 15302))]#0 > 0x0000003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x0000003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x0000003c5d8095d8 in _L_lock_854 () from /lib64/libpthread.so.0 > #2 0x0000003c5d8094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000001a54ae4 in Mutex::Lock (this=0x7fd426453598, > no_lockdep=false) at common/Mutex.cc:96 > #4 0x0000000001409285 in Mutex::Locker::Locker (this=0x7fd440beb6c0, > m=...) at common/Mutex.h:115 > #5 0x0000000001c46446 in PipeConnection::try_get_pipe > (this=0x7fd426453580, p=0x7fd440beb908) at msg/simple/PipeConnection.cc:38 > #6 0x0000000001c05809 in SimpleMessenger::submit_message > (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580, dest_addr=..., > dest_type=4, already_locked=false) at msg/simple/SimpleMessenger.cc:443 > #7 0x0000000001c033fa in SimpleMessenger::_send_message > (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at > msg/simple/SimpleMessenger.cc:136 > #8 0x0000000001c467c7 in SimpleMessenger::send_message > (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at > msg/simple/SimpleMessenger.h:139 > #9 0x0000000001c466a1 in PipeConnection::send_message > (this=0x7fd426453580, m=0x7fd425538d00) at msg/simple/PipeConnection.cc:78 > #10 0x00000000013b3ff2 in OSDService::send_map (this=0x7fd4821e76c8, > m=0x7fd425538d00, con=0x7fd426453580) at osd/OSD.cc:1054 > #11 0x00000000013b45e7 in OSDService::send_incremental_map > (this=0x7fd4821e76c8, since=4028, con=0x7fd426453580, > osdmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480) at osd/OSD.cc:1087 > #12 0x00000000013b215f in OSDService::share_map_peer (this=0x7fd4821e76c8, > peer=9, con=0x7fd426453580, map=std::tr1::shared_ptr (count 49) > 0x7fd426c0f480) at osd/OSD.cc:887 > #13 0x00000000013f43cc in OSD::do_notifies (this=0x7fd4821e6000, > notify_list=std::map with 7 elements = {...}, curmap=std::tr1::shared_ptr > (count 49) 0x7fd426c0f480) at osd/OSD.cc:7246 > #14 0x00000000013f3c99 in OSD::dispatch_context (this=0x7fd4821e6000, > ctx=..., pg=0x0, curmap=std::tr1::shared_ptr (count 49) 0x7fd426c0f480, > handle=0x7fd440becb40) at osd/OSD.cc:7198 > #15 0x000000000140043e in OSD::process_peering_events > (this=0x7fd4821e6000, pgs=std::list = {...}, handle=...) at osd/OSD.cc:8539 > #16 0x000000000141e094 in OSD::PeeringWQ::_process (this=0x7fd4821e7070, > pgs=std::list = {...}, handle=...) at osd/OSD.h:1601 > #17 0x00000000014b94bf in ThreadPool::BatchWorkQueue<PG>::_void_process > (this=0x7fd4821e7070, p=0x7fd425419040, handle=...) at > common/WorkQueue.h:107 > #18 0x0000000001b2d2e8 in ThreadPool::worker (this=0x7fd4821e64b0, > wt=0x7fd4761db430) at common/WorkQueue.cc:128 > #19 0x0000000001b313f7 in ThreadPool::WorkThread::entry > (this=0x7fd4761db430) at common/WorkQueue.h:318 > #20 0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd4761db430) at > common/Thread.cc:61 > #21 0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd4761db430) at > common/Thread.cc:45 > #22 0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0 > #23 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6 > > As is shown, the thread is waiting for a mutex lock which we believe is > Connection::lock . And the thread that is holding this mutex lock which the > waiting thread is trying to get is a pipe reader_thread who is trying to > read a full message that sent from another OSD: > > (gdb) frame 2 > #2 0x0000003c5d8094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 > (gdb) info reg > rax 0xfffffffffffffe00 -512 rbx > 0x7fd425538d00 140549136026880 rcx > 0xffffffffffffffff -1 rdx 0x10 16 > rsi 0x80 128 rdi 0x7fd4264535a8 > 140549151864232 rbp 0x7fd440beb680 0x7fd440beb680 > rsp 0x7fd440beb648 0x7fd440beb648 > r8 0x7fd4264535a8 140549151864232 > r9 0x3bc6 15302 > r10 0x1 1 > r11 0x246 582 > r12 0x4 4 > r13 0x7fd440bed9c0 140549596043712 > r14 0x0 0 > r15 0x3 3 > rip 0x3c5d8094a7 0x3c5d8094a7 <pthread_mutex_lock+55> > eflags 0x246 [ PF ZF IF ] cs 0x33 51 > ss 0x2b 43 ds 0x0 0 es > 0x0 0 fs 0x0 0 gs 0x0 0 > (gdb) p *(pthread_mutex_t*)0x7fd4264535a8 > $4 = {__data = {__lock = 2, __count = 0, __owner = 5008, __nusers = 1, > __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = > "\002\000\000\000\000\000\000\000\220\023\000\000\001\000\000\000\002", > '\000' <repeats 22 times>, __align = 2} > (gdb) thread 50 > [Switching to thread 50 (Thread 0x7fd4001f1700 (LWP 5008))]#0 > 0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x0000003c5d80ec2c in recv () from /lib64/libpthread.so.0 > #1 0x0000000001c3a6f8 in Pipe::do_recv (this=0x7fd426561000, > buf=0x7fd42649c000 "\a\220\006", len=4096, flags=64) at > msg/simple/Pipe.cc:2428 > #2 0x0000000001c3aa4d in Pipe::buffered_recv (this=0x7fd426561000, > buf=0x7fd4224c01d3 "", len=4096, flags=64) at msg/simple/Pipe.cc:2474 > #3 0x0000000001c3ab0d in Pipe::tcp_read_nonblocking (this=0x7fd426561000, > buf=0x7fd4224c0000 "\t\to\030", len=4096) at msg/simple/Pipe.cc:2492 > #4 0x0000000001c37a25 in Pipe::read_message (this=0x7fd426561000, > pm=0x7fd4001f0b80, auth_handler=0x7fd422481080) at msg/simple/Pipe.cc:2032 > #5 0x0000000001c33146 in Pipe::reader (this=0x7fd426561000) at > msg/simple/Pipe.cc:1581 > #6 0x0000000001c3b0c8 in Pipe::Reader::entry (this=0x7fd426561018) at > msg/simple/Pipe.h:50 > #7 0x0000000001b33d40 in Thread::entry_wrapper (this=0x7fd426561018) at > common/Thread.cc:61 > #8 0x0000000001b33cb2 in Thread::_entry_func (arg=0x7fd426561018) at > common/Thread.cc:45 > #9 0x0000003c5d807aa1 in start_thread () from /lib64/libpthread.so.0 > #10 0x0000003c5d4e8aad in clone () from /lib64/libc.so.6 > > Because we were dropping a large amount of ip packets, so, at first, we > thought the “recv” call is blocked for a long time, which led to the > suicide timeout. Later, we found that the invocation of "recv" is passing a > MSG_DONTWAIT flag which makes it act in a non-blocking way and wouldn't be > blocked, however, every time we encounter this problem, it's the > Pipe::reader_thread who's doing "recv" that holds the Connection::lock. > This really confuses us. > > On the other hand, after reading the source code, we found that in the > submit_message method, the waiting thread is trying to get a pipe for which > it must get the Connection::lock. However, what the submit_message is > trying to do is just put the message in the pipe's out_q, it seems not > necessary for it to be synchronized with the pipe's reader_thread. So we > guess this might be a bug. Is this right? Or Am I misunderstanding > something? > > Because the test shows that when one machine is suffering a heavy packet > loss, OSDs on other machines could be brought down, which means it might > cause the whole cluster to be malfunctioning, we are really concerned. > > Please help us, thank you☺ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com