which version of kernel client did you use? please send out context of client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots memory.
Regards Yan, Zheng On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt <stijn.dewei...@ugent.be> wrote: > wow. kudos for integrating this in ceph. more projects should do it like > that! > > anyway, in attachement a gzipped ps file. heap is at 4.4GB, top reports > 6.5GB mem usage. > > care to point out what to look for? i'll send a new one when the usage is > starting to cause swapping. > > thanks, > > stijn > > > On 04/02/2014 06:35 PM, Gregory Farnum wrote: >> >> Did you see >> http://ceph.com/docs/master/rados/troubleshooting/memory-profiling? >> That should have what you need to get started, although you'll also >> need to learn the basics of using the heap analysis tools elsewhere. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Wed, Apr 2, 2014 at 9:23 AM, Stijn De Weirdt <stijn.dewei...@ugent.be> >> wrote: >>> >>> hi, >>> >>> >>>>> 1) How big and what shape the filesystem is. Do you have some >>>>> extremely large directory that the MDS keeps trying to load and then >>>>> dump? >>>> >>>> >>>> anyway to extract this from the mds without having to start it? as it >>>> was an rsync operation, i can try to locate possible candidates on the >>>> source filesystem, but what would be considered "large"? >>> >>> >>> total number of files 13M, spread over 800k directories, but it's unclear >>> how far the sync was at time of failing. i've not found a good way to for >>> directories with lots of files and/or subdirs. >>> >>> >>>> >>>>> 2) Use tcmalloc's heap analyzer to see where all the memory is being >>>>> allocated. >>>> >>>> >>>> we'll giv ethat a try >>> >>> >>> i run ceph-mds with HEAPCHECK=normal (via the init script), but how can >>> we >>> "stop" mds without killing it? the heapchecker only seems to dump at the >>> end >>> of a run, maybe there's a way to have intermediate dump like valgrind, >>> but >>> the documentation is not very helpful. >>> >>> stijn >>> >>> >>>> >>>>> 3) Look through the logs for when the beacon fails (the first of >>>>> "mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see >>>>> if there's anything tell-tale going on at the time. >>>> >>>> >>>> >>>> anything in particular we should be looking for? >>>> >>>> the log goes as follows: >>>> mds starts around 11:43 >>>> ... >>>> >>>>> 2014-04-01 11:44:23.658583 7ffec89c6700 1 mds.0.server >>>>> reconnect_clients -- 1 sessions >>>>> 2014-04-01 11:44:41.212488 7ffec89c6700 0 log [DBG] : reconnect by >>>>> client.4585 10.141.8.199:0/3551 after 17.553854 >>>>> 2014-04-01 11:44:45.692237 7ffec89c6700 1 mds.0.10 reconnect_done >>>>> 2014-04-01 11:44:45.996384 7ffec89c6700 1 mds.0.10 handle_mds_map i >>>>> am now mds.0.10 >>>>> 2014-04-01 11:44:45.996388 7ffec89c6700 1 mds.0.10 handle_mds_map >>>>> state change up:reconnect --> up:rejoin >>>>> 2014-04-01 11:44:45.996390 7ffec89c6700 1 mds.0.10 rejoin_start >>>>> 2014-04-01 11:49:53.158471 7ffec89c6700 1 mds.0.10 rejoin_joint_start >>>> >>>> >>>> >>>> then lots (4667 lines) of >>>>> >>>>> >>>>> 2014-04-01 11:50:10.237035 7ffebc844700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0 >>>>> >>>>> l=1 c=0x6513e8840).connect got RESETSESSION but no longer connecting >>>> >>>> >>>> >>>> with one intermediate >>>>> >>>>> >>>>> 2014-04-01 11:51:50.181354 7ffebcf4b700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1 >>>>> >>>>> c=0xc77d5ee0).fault >>>> >>>> >>>> >>>> >>>> then sudden change >>>>> >>>>> >>>>> 2014-04-01 11:57:30.440554 7ffebcd49700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0 >>>>> >>>>> l=1 c=0xd99b11e0).connect got RESETSESSION but no longer connecting >>>>> 2014-04-01 11:57:30.722607 7ffebec68700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 >>>>> >>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER >>>>> 2014-04-01 11:57:30.722669 7ffebec68700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 >>>>> >>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER >>>>> 2014-04-01 11:57:30.722885 7ffebec68700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 >>>>> >>>>> c=0xc48a3f40).connect got BADAUTHORIZER >>>>> 2014-04-01 11:57:30.722945 7ffebec68700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 >>>>> >>>>> c=0xc48a3f40).connect got BADAUTHORIZER >>>> >>>> >>>> >>>> followed by lots of >>>>> >>>>> >>>>> 2014-04-01 11:57:30.738562 7ffebec68700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1 >>>>> >>>>> c=0x10e5d280).fault >>>> >>>> >>>> >>>> with sporadic >>>>> >>>>> >>>>> 2014-04-01 11:57:32.431219 7ffebeb67700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0 >>>>> >>>>> l=1 c=0x130590dc0).connect got RESETSESSION but no longer connecting >>>> >>>> >>>> >>>> >>>> until the dmup >>>>> >>>>> >>>>> 2014-04-01 11:59:27.612850 7ffebea66700 0 -- 10.141.8.180:6837/55117 >>>>>>> >>>>>>> 10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1 >>>>> >>>>> c=0xa7be300).fault >>>>> 2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function >>>>> 'void CDir::_omap_fetched(ceph::bufferlist&, >>>>> std::map<std::basic_string<char, std::char_traits<char>, >>>>> std::allocator\ >>>>> <char> >, ceph::buffer::list, std::less<std::basic_string<char, >>>>> std::char_traits<char>, std::allocator<char> > >, >>>>> std::allocator<std::pair<const std::basic_string<char, std::char_trait\ >>>>> s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const >>>>> std::string&, int)' thread 7ffec89c6700 time 2014-04-01 11:59:27.620684 >>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61) >>>>> >>>>> ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) >>>>> 1: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string, >>>>> ceph::buffer::list, std::less<std::string>, >>>>> std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, >>>>> st\ >>>>> d::string const&, int)+0x4d71) [0x77c3c1] >>>>> 2: (Context::complete(int)+0x9) [0x56bb79] >>>>> 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6] >>>>> 4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7] >>>>> 5: (MDS::_dispatch(Message*)+0x2f) [0x59028f] >>>>> 6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b] >>>>> 7: (DispatchQueue::entry()+0x582) [0x902072] >>>>> 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d] >>>>> 9: /lib64/libpthread.so.0() [0x34c36079d1] >>>>> 10: (clone()+0x6d) [0x34c32e8b6d] >>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>> needed to interpret this. >>>>> >>>>> --- begin dump of recent events --- >>>>> -10000> 2014-04-01 11:59:27.137779 7ffec89c6700 5 mds.0.10 initiating >>>>> monitor reconnect; maybe we're not the slow one >>>>> -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10 monclient(hunting): >>>>> _reopen_session rank -1 name >>>>> -9998> 2014-04-01 11:59:27.137790 7ffec89c6700 1 -- >>>>> 10.141.8.180:6837/55117 mark_down 0x1326be460 -- 0x2cae5f00 >>>>> -9997> 2014-04-01 11:59:27.137807 7ffebf672700 2 -- >>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57 >>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect error 10.141.8.1\ >>>>> 81:6789/0, 104: Connection reset by peer >>>>> -9996> 2014-04-01 11:59:27.137819 7ffebf672700 3 -- >>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57 >>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect fault, but state\ >>>>> = closed != connecting, stopping >>>>> -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10 monclient(hunting): >>>>> picked mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0 >>>>> -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10 monclient(hunting): >>>>> _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -Greg >>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>> >>>>> >>>>> On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman >>>>> <kenneth.waege...@ugent.be> wrote: >>>>>> >>>>>> >>>>>> >>>>>> ----- Message from Gregory Farnum <g...@inktank.com> --------- >>>>>> Date: Tue, 1 Apr 2014 09:03:17 -0700 >>>>>> From: Gregory Farnum <g...@inktank.com> >>>>>> >>>>>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error) >>>>>> To: "Yan, Zheng" <uker...@gmail.com> >>>>>> Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users >>>>>> <ceph-users@lists.ceph.com> >>>>>> >>>>>> >>>>>> >>>>>>> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman >>>>>>>> <kenneth.waege...@ugent.be> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> After some more searching, I've found that the source of the >>>>>>>>> problem is >>>>>>>>> with >>>>>>>>> the mds and not the mon.. The mds crashes, generates a core dump >>>>>>>>> that >>>>>>>>> eats >>>>>>>>> the local space, and in turn the monitor (because of leveldb) >>>>>>>>> crashes. >>>>>>>>> >>>>>>>>> The error in the mds log of one host: >>>>>>>>> >>>>>>>>> 2014-04-01 15:46:34.414615 7f870e319700 0 -- >>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 >>>>>>>>> l=1 >>>>>>>>> c=0x147ac780).connect got RESETSESSION but no longer connecting >>>>>>>>> 2014-04-01 15:46:34.438792 7f871194f700 0 -- >>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1 >>>>>>>>> c=0x1fd44360).connect got RESETSESSION but no longer connecting >>>>>>>>> 2014-04-01 15:46:34.439028 7f870e319700 0 -- >>>>>>>>> 10.141.8.180:6836/13152 >> >>>>>>>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 >>>>>>>>> l=1 >>>>>>>>> c=0x1fd43de0).connect got RESETSESSION but no longer connecting >>>>>>>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function >>>>>>>>> 'void >>>>>>>>> CDir::_omap_fetched(ceph::bufferlist&, >>>>>>>>> std::map<std::basic_string<char, >>>>>>>>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, >>>>>>>>> std::less<std::basic_string<char, std::char_traits<char>, >>>>>>>>> std::allocator<char> > >, std::allocator<std::pair<const >>>>>>>>> std::basic_string<char, std::char_traits<char>, >>>>>>>>> std::allocator<char> >, >>>>>>>>> ceph::buffer::list> > >&, const std::string&, int)' thread >>>>>>>>> 7f871b7ae700 >>>>>>>>> time >>>>>>>>> 2014-04-01 15:46:34.448320 >>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61) >>>>>>>>> >>>>>>>> >>>>>>>> could you use gdb to check what is value of variable 'r' . >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> If you look at the crash dump log you can see the return value in the >>>>>>> osd_op_reply message: >>>>>>> -1> 2014-04-01 15:46:34.440860 7f871b7ae700 1 -- >>>>>>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ==== >>>>>>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header >>>>>>> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after >>>>>>> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0) >>>>>>> 0x2cff7aa80 con 0x37ea3c0 >>>>>>> >>>>>>> -108, which is ESHUTDOWN, but we also use it (via the 108 constant, I >>>>>>> think because ESHUTDOWN varies across platforms) as EBLACKLISTED. >>>>>>> So it looks like this is itself actually a symptom of another problem >>>>>>> that is causing the MDS to get timed out on the monitor. If a core >>>>>>> dump is "eating the local space", maybe the MDS is stuck in an >>>>>>> infinite allocation loop of some kind? How big are your disks, >>>>>>> Kenneth? Do you have any information on how much CPU/memory the MDS >>>>>>> was using before this? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I monitored the mds process after restart: >>>>>> >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>>> 19215 root 20 0 6070m 5.7g 5236 S 778.6 18.1 1:27.54 ceph-mds >>>>>> 19215 root 20 0 7926m 7.5g 5236 S 179.2 23.8 2:44.39 ceph-mds >>>>>> 19215 root 20 0 12.4g 12g 5236 S 157.2 38.8 3:43.47 ceph-mds >>>>>> 19215 root 20 0 16.6g 16g 5236 S 144.4 52.0 4:15.01 ceph-mds >>>>>> 19215 root 20 0 19.9g 19g 5236 S 137.2 62.5 4:35.83 ceph-mds >>>>>> 19215 root 20 0 24.5g 24g 5224 S 136.5 77.0 5:04.66 ceph-mds >>>>>> 19215 root 20 0 25.8g 25g 2944 S 33.7 81.2 5:13.74 ceph-mds >>>>>> 19215 root 20 0 26.0g 25g 2916 S 24.6 81.7 5:19.07 ceph-mds >>>>>> 19215 root 20 0 26.1g 25g 2916 S 13.0 82.1 5:22.16 ceph-mds >>>>>> 19215 root 20 0 27.7g 26g 1856 S 100.0 85.8 5:36.46 ceph-mds >>>>>> >>>>>> Then it crashes. I changed the core dump location out of the root fs, >>>>>> the >>>>>> core dump is indeed about 26G >>>>>> >>>>>> My disks: >>>>>> >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> /dev/sda2 9.9G 2.9G 6.5G 31% / >>>>>> tmpfs 16G 0 16G 0% /dev/shm >>>>>> /dev/sda1 248M 53M 183M 23% /boot >>>>>> /dev/sda4 172G 61G 112G 35% /var/lib/ceph/log/sda4 >>>>>> /dev/sdb 187G 61G 127G 33% /var/lib/ceph/log/sdb >>>>>> /dev/sdc 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdc >>>>>> /dev/sdd 3.7T 1.5T 2.2T 41% /var/lib/ceph/osd/sdd >>>>>> /dev/sde 3.7T 1.4T 2.4T 37% /var/lib/ceph/osd/sde >>>>>> /dev/sdf 3.7T 1.5T 2.3T 39% /var/lib/ceph/osd/sdf >>>>>> /dev/sdg 3.7T 2.1T 1.7T 56% /var/lib/ceph/osd/sdg >>>>>> /dev/sdh 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdh >>>>>> /dev/sdi 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdi >>>>>> /dev/sdj 3.7T 1.7T 2.0T 47% /var/lib/ceph/osd/sdj >>>>>> /dev/sdk 3.7T 2.1T 1.6T 58% /var/lib/ceph/osd/sdk >>>>>> /dev/sdl 3.7T 1.7T 2.0T 46% /var/lib/ceph/osd/sdl >>>>>> /dev/sdm 3.7T 1.5T 2.2T 41% /var/lib/ceph/osd/sdm >>>>>> /dev/sdn 3.7T 1.4T 2.3T 38% /var/lib/ceph/osd/sdn >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> -Greg >>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ----- End message from Gregory Farnum <g...@inktank.com> ----- >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Met vriendelijke groeten, >>>>>> Kenneth Waegeman >>>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com