On Tue, Apr 15, 2014 at 9:49 PM, Kenneth Waegeman <kenneth.waege...@ugent.be> wrote: > > ----- Message from "Yan, Zheng" <uker...@gmail.com> --------- > Date: Tue, 15 Apr 2014 21:24:04 +0800 > From: "Yan, Zheng" <uker...@gmail.com> > > Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error) > To: Kenneth Waegeman <kenneth.waege...@ugent.be> > Cc: ceph-users@lists.ceph.com > > > >> On Tue, Apr 15, 2014 at 8:55 PM, Kenneth Waegeman >> <kenneth.waege...@ugent.be> wrote: >>> >>> Hi, >>> >>> We are back to our MDS problem. >>> >>> This time we used the kernel client to connect to the cluster. We started >>> again filling the cluster with rsync, and got problems after about 48h. >>> The >>> MDS crashed again with core dump like before. >>> I removed the client connection, and restarted the MDS's. The first MDS >>> got >>> killed by oom-killer when it used 8G of RAM. The backup MDS took over, >>> and I >>> stopped the OSDs(with noout set) on that host so there was more RAM for >>> testing the MDS. The MDS started again to fill the memory, this time to >>> 30G. >>> (We have 32G) >>> Meanwhile, I saw some interesting rados statistics: >>> The metadata pool was heavily decreasing in KB, and increasing in object >>> count. >>> >>> pool name category KB objects clones >>> degraded unfound rd rd KB wr wr KB >>> data - 17707360279 11641299 0 >>> 11667979 0 70364 0 15904215 17711309640 >>> metadata - 95833067 405395 0 >>> 484371 0 273320 925385352 12283908 271184622 >>> >>> Tue Apr 15 12:55:34 2014 >>> >>> pool name category KB objects clones >>> degraded unfound rd rd KB wr wr KB >>> data - 17707360279 11667145 0 >>> 11719671 0 95576 0 17619242 17711309640 >>> metadata - 49677494 494630 0 >>> 662841 0 294946 1013866123 12748893 271914066 >>> >>> >>> After oom-killer killed ceph-mds again: >>> >>> Out of memory: Kill process 40776 (ceph-mds) score 808 or sacrifice child >>> Killed process 40776, UID 0, (ceph-mds) total-vm:31382436kB, >>> anon-rss:29960904kB, file-rss:20kB python[42731] trap invalid opcode >>> ip:7f62ddd8ffc1 sp:7f62ccef6488 error:0 >>> >>> >>> this reduction was still going on (no clients connected): >>> >>> Tue Apr 15 13:20:24 2014 >>> >>> pool name category KB objects clones >>> degraded unfound rd rd KB wr wr KB >>> data - 17707360279 11686446 0 >>> 11758273 0 95576 0 18447710 17711309640 >>> metadata - 26526902 523234 0 >>> 720049 0 294946 1013866123 12886549 271914355 >>> >>> Tue Apr 15 13:34:49 2014 >>> >>> pool name category KB objects clones >>> degraded unfound rd rd KB wr wr KB >>> data - 17707360279 11707425 0 >>> 11800231 0 95576 0 18989100 17711309640 >>> metadata - 12612790 543214 0 >>> 760009 0 294946 1013866123 12977285 271914524 >>> >>> >>> >>> The MDS didn't crash anymore and stopped increasing RAM at 20G >>> Until it eventually stabilized: >>> >>> >>> Tue Apr 15 13:47:35 2014 >>> >>> pool name category KB objects clones >>> degraded unfound rd rd KB wr wr KB >>> data - 17707360279 11720750 0 >>> 11826881 0 95576 0 19346286 17711309640 >>> metadata - 126985 579061 0 >>> 831703 0 294946 1013866123 13146061 272057126 >>> >>> >>> The MDS is now only using <200m and idle >>> >>> Does this looks familiar to someone? >>> >>> It looks like the MDS is unable to process the requests, and therefore >>> tried >>> to allocate more memory until it resolved? >>> >>> >> >> Looks like the MDS journal becomes too large. So far there is no >> mechanism to throttle client request when MDS can't trim log fast >> enough. The fix should be easy. > > > What do you mean by the MDS journal? Where can I find this journal? > Can a better CPU solve the slow trimming? ( Now 2 hexacore AMD Opteron 4334) >
MDS uses journal to record recent metadata update. the journal is stored in metadata pool (object name 200.*). The speed of trimming log is limited by OSD performance. Better CPU does not help. Please try https://github.com/ceph/ceph/commit/6dfed2693e9002dbaf82b3dc1a637e1c53878fe3 I hope it can solve this issue. Regards Yan, Zheng > Thanks! > >> >> Regards >> Yan, Zheng >> >>> Thanks! >>> >>> Kenneth >>> >>> ----- Message from Stijn De Weirdt <stijn.dewei...@ugent.be> --------- >>> Date: Fri, 04 Apr 2014 20:31:34 +0200 >>> From: Stijn De Weirdt <stijn.dewei...@ugent.be> >>> >>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error) >>> To: "Yan, Zheng" <uker...@gmail.com> >>> Cc: ceph-users@lists.ceph.com >>> >>> >>> >>>> hi yan, >>>> >>>> (taking the list in CC) >>>> >>>> On 04/04/2014 04:44 PM, Yan, Zheng wrote: >>>>> >>>>> >>>>> On Thu, Apr 3, 2014 at 2:52 PM, Stijn De Weirdt >>>>> <stijn.dewei...@ugent.be> >>>>> wrote: >>>>>> >>>>>> >>>>>> hi, >>>>>> >>>>>> latest pprof output attached. >>>>>> >>>>>> this is no kernel client, this is ceph-fuse on EL6. starting the mds >>>>>> without >>>>>> any ceph-fuse mounts works without issue. mounting ceph-fuse >>>>>> afterwards >>>>>> also >>>>>> works fine. simple filesystem operations work as expected. >>>>>> >>>>>> we'll check the state of the fuse mount via fusectl once we've >>>>>> reproduced >>>>>> the issue. >>>>> >>>>> >>>>> >>>>> One dark side of fuse client is that there is no way to purge inodes >>>>> in kernel inode caches. Each cached inode in kernel pins corresponding >>>>> inode in ceph-mds. I think that's why ceph-fuse and ceph-mds use so >>>>> many memory. Could you try executing "echo 3 > >>>>> /proc/sys/vm/drop_caches" periodically on the ceph-fuse mount machine. >>>> >>>> >>>> thanks a lot for the suggestion. that night very well explain the issue. >>>> we're currently trying looking into other issues that we tought was >>>> related >>>> (extremely low mdtest performance), but we will retry your suggestion >>>> when >>>> we start to reproduce the condition >>>> >>>> >>>> >>>> stijn >>>> >>>>> >>>>> Regards >>>>> Yan, Zheng >>>>> >>>>>> >>>>>> >>>>>> stijn >>>>>> >>>>>> >>>>>> On 04/03/2014 02:39 AM, Yan, Zheng wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> which version of kernel client did you use? please send out context >>>>>>> of >>>>>>> client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots >>>>>>> memory. >>>>>>> >>>>>>> Regards >>>>>>> Yan, Zheng >>>>>>> >>>>>>> On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt >>>>>>> <stijn.dewei...@ugent.be> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> wow. kudos for integrating this in ceph. more projects should do it >>>>>>>> like >>>>>>>> that! >>>>>>>> >>>>>>>> anyway, in attachement a gzipped ps file. heap is at 4.4GB, top >>>>>>>> reports >>>>>>>> 6.5GB mem usage. >>>>>>>> >>>>>>>> care to point out what to look for? i'll send a new one when the >>>>>>>> usage >>>>>>>> is >>>>>>>> starting to cause swapping. >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> stijn >>>>>>>> >>>>>>>> >>>>>>>> On 04/02/2014 06:35 PM, Gregory Farnum wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Did you see >>>>>>>>> http://ceph.com/docs/master/rados/troubleshooting/memory-profiling? >>>>>>>>> That should have what you need to get started, although you'll also >>>>>>>>> need to learn the basics of using the heap analysis tools >>>>>>>>> elsewhere. >>>>>>>>> -Greg >>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 2, 2014 at 9:23 AM, Stijn De Weirdt >>>>>>>>> <stijn.dewei...@ugent.be> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> 1) How big and what shape the filesystem is. Do you have some >>>>>>>>>>>> extremely large directory that the MDS keeps trying to load and >>>>>>>>>>>> then >>>>>>>>>>>> dump? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> anyway to extract this from the mds without having to start it? >>>>>>>>>>> as >>>>>>>>>>> it >>>>>>>>>>> was an rsync operation, i can try to locate possible candidates >>>>>>>>>>> on >>>>>>>>>>> the >>>>>>>>>>> source filesystem, but what would be considered "large"? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> total number of files 13M, spread over 800k directories, but it's >>>>>>>>>> unclear >>>>>>>>>> how far the sync was at time of failing. i've not found a good way >>>>>>>>>> to >>>>>>>>>> for >>>>>>>>>> directories with lots of files and/or subdirs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2) Use tcmalloc's heap analyzer to see where all the memory is >>>>>>>>>>>> being >>>>>>>>>>>> allocated. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> we'll giv ethat a try >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> i run ceph-mds with HEAPCHECK=normal (via the init script), but >>>>>>>>>> how >>>>>>>>>> can >>>>>>>>>> we >>>>>>>>>> "stop" mds without killing it? the heapchecker only seems to dump >>>>>>>>>> at >>>>>>>>>> the >>>>>>>>>> end >>>>>>>>>> of a run, maybe there's a way to have intermediate dump like >>>>>>>>>> valgrind, >>>>>>>>>> but >>>>>>>>>> the documentation is not very helpful. >>>>>>>>>> >>>>>>>>>> stijn >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 3) Look through the logs for when the beacon fails (the first of >>>>>>>>>>>> "mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and >>>>>>>>>>>> see >>>>>>>>>>>> if there's anything tell-tale going on at the time. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> anything in particular we should be looking for? >>>>>>>>>>> >>>>>>>>>>> the log goes as follows: >>>>>>>>>>> mds starts around 11:43 >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:44:23.658583 7ffec89c6700 1 mds.0.server >>>>>>>>>>>> reconnect_clients -- 1 sessions >>>>>>>>>>>> 2014-04-01 11:44:41.212488 7ffec89c6700 0 log [DBG] : reconnect >>>>>>>>>>>> by >>>>>>>>>>>> client.4585 10.141.8.199:0/3551 after 17.553854 >>>>>>>>>>>> 2014-04-01 11:44:45.692237 7ffec89c6700 1 mds.0.10 >>>>>>>>>>>> reconnect_done >>>>>>>>>>>> 2014-04-01 11:44:45.996384 7ffec89c6700 1 mds.0.10 >>>>>>>>>>>> handle_mds_map >>>>>>>>>>>> i >>>>>>>>>>>> am now mds.0.10 >>>>>>>>>>>> 2014-04-01 11:44:45.996388 7ffec89c6700 1 mds.0.10 >>>>>>>>>>>> handle_mds_map >>>>>>>>>>>> state change up:reconnect --> up:rejoin >>>>>>>>>>>> 2014-04-01 11:44:45.996390 7ffec89c6700 1 mds.0.10 rejoin_start >>>>>>>>>>>> 2014-04-01 11:49:53.158471 7ffec89c6700 1 mds.0.10 >>>>>>>>>>>> rejoin_joint_start >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> then lots (4667 lines) of >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:50:10.237035 7ffebc844700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> l=1 c=0x6513e8840).connect got RESETSESSION but no longer >>>>>>>>>>>> connecting >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> with one intermediate >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:51:50.181354 7ffebcf4b700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 >>>>>>>>>>>>>> l=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c=0xc77d5ee0).fault >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> then sudden change >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:57:30.440554 7ffebcd49700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> l=1 c=0xd99b11e0).connect got RESETSESSION but no longer >>>>>>>>>>>> connecting >>>>>>>>>>>> 2014-04-01 11:57:30.722607 7ffebec68700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER >>>>>>>>>>>> 2014-04-01 11:57:30.722669 7ffebec68700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER >>>>>>>>>>>> 2014-04-01 11:57:30.722885 7ffebec68700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>> l=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c=0xc48a3f40).connect got BADAUTHORIZER >>>>>>>>>>>> 2014-04-01 11:57:30.722945 7ffebec68700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>> l=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c=0xc48a3f40).connect got BADAUTHORIZER >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> followed by lots of >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:57:30.738562 7ffebec68700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 >>>>>>>>>>>>>> l=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c=0x10e5d280).fault >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> with sporadic >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:57:32.431219 7ffebeb67700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 >>>>>>>>>>>>>> cs=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> l=1 c=0x130590dc0).connect got RESETSESSION but no longer >>>>>>>>>>>> connecting >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> until the dmup >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-04-01 11:59:27.612850 7ffebea66700 0 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 >>>>>>>>>>>>>> l=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> c=0xa7be300).fault >>>>>>>>>>>> 2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In >>>>>>>>>>>> function >>>>>>>>>>>> 'void CDir::_omap_fetched(ceph::bufferlist&, >>>>>>>>>>>> std::map<std::basic_string<char, std::char_traits<char>, >>>>>>>>>>>> std::allocator\ >>>>>>>>>>>> <char> >, ceph::buffer::list, std::less<std::basic_string<char, >>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > >, >>>>>>>>>>>> std::allocator<std::pair<const std::basic_string<char, >>>>>>>>>>>> std::char_trait\ >>>>>>>>>>>> s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const >>>>>>>>>>>> std::string&, int)' thread 7ffec89c6700 time 2014-04-01 >>>>>>>>>>>> 11:59:27.620684 >>>>>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61) >>>>>>>>>>>> >>>>>>>>>>>> ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) >>>>>>>>>>>> 1: (CDir::_omap_fetched(ceph::buffer::list&, >>>>>>>>>>>> std::map<std::string, >>>>>>>>>>>> ceph::buffer::list, std::less<std::string>, >>>>>>>>>>>> std::allocator<std::pair<std::string const, ceph::buffer::list> >>>>>>>>>>>> > >>>>>>>>>>>> >&, >>>>>>>>>>>> st\ >>>>>>>>>>>> d::string const&, int)+0x4d71) [0x77c3c1] >>>>>>>>>>>> 2: (Context::complete(int)+0x9) [0x56bb79] >>>>>>>>>>>> 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) >>>>>>>>>>>> [0x806dd6] >>>>>>>>>>>> 4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7] >>>>>>>>>>>> 5: (MDS::_dispatch(Message*)+0x2f) [0x59028f] >>>>>>>>>>>> 6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b] >>>>>>>>>>>> 7: (DispatchQueue::entry()+0x582) [0x902072] >>>>>>>>>>>> 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d] >>>>>>>>>>>> 9: /lib64/libpthread.so.0() [0x34c36079d1] >>>>>>>>>>>> 10: (clone()+0x6d) [0x34c32e8b6d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS >>>>>>>>>>>> <executable>` >>>>>>>>>>>> is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> >>>>>>>>>>>> --- begin dump of recent events --- >>>>>>>>>>>> -10000> 2014-04-01 11:59:27.137779 7ffec89c6700 5 mds.0.10 >>>>>>>>>>>> initiating >>>>>>>>>>>> monitor reconnect; maybe we're not the slow one >>>>>>>>>>>> -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10 >>>>>>>>>>>> monclient(hunting): >>>>>>>>>>>> _reopen_session rank -1 name >>>>>>>>>>>> -9998> 2014-04-01 11:59:27.137790 7ffec89c6700 1 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 mark_down 0x1326be460 -- 0x2cae5f00 >>>>>>>>>>>> -9997> 2014-04-01 11:59:27.137807 7ffebf672700 2 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 >>>>>>>>>>>> sd=57 >>>>>>>>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect error 10.141.8.1\ >>>>>>>>>>>> 81:6789/0, 104: Connection reset by peer >>>>>>>>>>>> -9996> 2014-04-01 11:59:27.137819 7ffebf672700 3 -- >>>>>>>>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 >>>>>>>>>>>> sd=57 >>>>>>>>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect fault, but state\ >>>>>>>>>>>> = closed != connecting, stopping >>>>>>>>>>>> -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10 >>>>>>>>>>>> monclient(hunting): >>>>>>>>>>>> picked mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0 >>>>>>>>>>>> -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10 >>>>>>>>>>>> monclient(hunting): >>>>>>>>>>>> _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -Greg >>>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman >>>>>>>>>>>> <kenneth.waege...@ugent.be> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ----- Message from Gregory Farnum <g...@inktank.com> --------- >>>>>>>>>>>>> Date: Tue, 1 Apr 2014 09:03:17 -0700 >>>>>>>>>>>>> From: Gregory Farnum <g...@inktank.com> >>>>>>>>>>>>> >>>>>>>>>>>>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus >>>>>>>>>>>>> error) >>>>>>>>>>>>> To: "Yan, Zheng" <uker...@gmail.com> >>>>>>>>>>>>> Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, >>>>>>>>>>>>> ceph-users >>>>>>>>>>>>> <ceph-users@lists.ceph.com> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman >>>>>>>>>>>>>>> <kenneth.waege...@ugent.be> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After some more searching, I've found that the source of the >>>>>>>>>>>>>>>> problem is >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> the mds and not the mon.. The mds crashes, generates a core >>>>>>>>>>>>>>>> dump >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> eats >>>>>>>>>>>>>>>> the local space, and in turn the monitor (because of >>>>>>>>>>>>>>>> leveldb) >>>>>>>>>>>>>>>> crashes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The error in the mds log of one host: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-04-01 15:46:34.414615 7f870e319700 0 -- >>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 >>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>> l=1 >>>>>>>>>>>>>>>> c=0x147ac780).connect got RESETSESSION but no longer >>>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>>> 2014-04-01 15:46:34.438792 7f871194f700 0 -- >>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 >>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>> l=1 >>>>>>>>>>>>>>>> c=0x1fd44360).connect got RESETSESSION but no longer >>>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>>> 2014-04-01 15:46:34.439028 7f870e319700 0 -- >>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 >>>>>>>>>>>>>>>> cs=0 >>>>>>>>>>>>>>>> l=1 >>>>>>>>>>>>>>>> c=0x1fd43de0).connect got RESETSESSION but no longer >>>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In >>>>>>>>>>>>>>>> function >>>>>>>>>>>>>>>> 'void >>>>>>>>>>>>>>>> CDir::_omap_fetched(ceph::bufferlist&, >>>>>>>>>>>>>>>> std::map<std::basic_string<char, >>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >, >>>>>>>>>>>>>>>> ceph::buffer::list, >>>>>>>>>>>>>>>> std::less<std::basic_string<char, std::char_traits<char>, >>>>>>>>>>>>>>>> std::allocator<char> > >, std::allocator<std::pair<const >>>>>>>>>>>>>>>> std::basic_string<char, std::char_traits<char>, >>>>>>>>>>>>>>>> std::allocator<char> >, >>>>>>>>>>>>>>>> ceph::buffer::list> > >&, const std::string&, int)' thread >>>>>>>>>>>>>>>> 7f871b7ae700 >>>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>> 2014-04-01 15:46:34.448320 >>>>>>>>>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == >>>>>>>>>>>>>>>> -61) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> could you use gdb to check what is value of variable 'r' . >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you look at the crash dump log you can see the return value >>>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> osd_op_reply message: >>>>>>>>>>>>>> -1> 2014-04-01 15:46:34.440860 7f871b7ae700 1 -- >>>>>>>>>>>>>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 >>>>>>>>>>>>>> ==== >>>>>>>>>>>>>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header >>>>>>>>>>>>>> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after >>>>>>>>>>>>>> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0) >>>>>>>>>>>>>> 0x2cff7aa80 con 0x37ea3c0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -108, which is ESHUTDOWN, but we also use it (via the 108 >>>>>>>>>>>>>> constant, >>>>>>>>>>>>>> I >>>>>>>>>>>>>> think because ESHUTDOWN varies across platforms) as >>>>>>>>>>>>>> EBLACKLISTED. >>>>>>>>>>>>>> So it looks like this is itself actually a symptom of another >>>>>>>>>>>>>> problem >>>>>>>>>>>>>> that is causing the MDS to get timed out on the monitor. If a >>>>>>>>>>>>>> core >>>>>>>>>>>>>> dump is "eating the local space", maybe the MDS is stuck in an >>>>>>>>>>>>>> infinite allocation loop of some kind? How big are your disks, >>>>>>>>>>>>>> Kenneth? Do you have any information on how much CPU/memory >>>>>>>>>>>>>> the >>>>>>>>>>>>>> MDS >>>>>>>>>>>>>> was using before this? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I monitored the mds process after restart: >>>>>>>>>>>>> >>>>>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>>>>>>>>>>>> COMMAND >>>>>>>>>>>>> 19215 root 20 0 6070m 5.7g 5236 S 778.6 18.1 1:27.54 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 7926m 7.5g 5236 S 179.2 23.8 2:44.39 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 12.4g 12g 5236 S 157.2 38.8 3:43.47 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 16.6g 16g 5236 S 144.4 52.0 4:15.01 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 19.9g 19g 5236 S 137.2 62.5 4:35.83 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 24.5g 24g 5224 S 136.5 77.0 5:04.66 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 25.8g 25g 2944 S 33.7 81.2 5:13.74 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 26.0g 25g 2916 S 24.6 81.7 5:19.07 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 26.1g 25g 2916 S 13.0 82.1 5:22.16 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> 19215 root 20 0 27.7g 26g 1856 S 100.0 85.8 5:36.46 >>>>>>>>>>>>> ceph-mds >>>>>>>>>>>>> >>>>>>>>>>>>> Then it crashes. I changed the core dump location out of the >>>>>>>>>>>>> root >>>>>>>>>>>>> fs, >>>>>>>>>>>>> the >>>>>>>>>>>>> core dump is indeed about 26G >>>>>>>>>>>>> >>>>>>>>>>>>> My disks: >>>>>>>>>>>>> >>>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>>> /dev/sda2 9.9G 2.9G 6.5G 31% / >>>>>>>>>>>>> tmpfs 16G 0 16G 0% /dev/shm >>>>>>>>>>>>> /dev/sda1 248M 53M 183M 23% /boot >>>>>>>>>>>>> /dev/sda4 172G 61G 112G 35% /var/lib/ceph/log/sda4 >>>>>>>>>>>>> /dev/sdb 187G 61G 127G 33% /var/lib/ceph/log/sdb >>>>>>>>>>>>> /dev/sdc 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdc >>>>>>>>>>>>> /dev/sdd 3.7T 1.5T 2.2T 41% /var/lib/ceph/osd/sdd >>>>>>>>>>>>> /dev/sde 3.7T 1.4T 2.4T 37% /var/lib/ceph/osd/sde >>>>>>>>>>>>> /dev/sdf 3.7T 1.5T 2.3T 39% /var/lib/ceph/osd/sdf >>>>>>>>>>>>> /dev/sdg 3.7T 2.1T 1.7T 56% /var/lib/ceph/osd/sdg >>>>>>>>>>>>> /dev/sdh 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdh >>>>>>>>>>>>> /dev/sdi 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdi >>>>>>>>>>>>> /dev/sdj 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdj >>>>>>>>>>>>> /dev/sdk 3.7T 2.1T 1.6T 58% /var/lib/ceph/osd/sdk >>>>>>>>>>>>> /dev/sdl 3.7T 1.7T 2.0T 46% /var/lib/ceph/osd/sdl >>>>>>>>>>>>> /dev/sdm 3.7T 1.5T 2.2T 41% /var/lib/ceph/osd/sdm >>>>>>>>>>>>> /dev/sdn 3.7T 1.4T 2.3T 38% /var/lib/ceph/osd/sdn >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> -Greg >>>>>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ----- End message from Gregory Farnum <g...@inktank.com> ----- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>>>>> Kenneth Waegeman >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list >>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list >>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@lists.ceph.com >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>> >>>>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >>> >>> ----- End message from Stijn De Weirdt <stijn.dewei...@ugent.be> ----- >>> >>> >>> -- >>> >>> Met vriendelijke groeten, >>> Kenneth Waegeman >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ----- End message from "Yan, Zheng" <uker...@gmail.com> ----- > > > -- > > Met vriendelijke groeten, > Kenneth Waegeman > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com