which version of kernel client did you use? please send out context of
client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots
memory.

Regards
Yan, Zheng

On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt <stijn.dewei...@ugent.be> wrote:
> wow. kudos for integrating this in ceph. more projects should do it like
> that!
>
> anyway, in attachement a gzipped ps file. heap is at 4.4GB, top reports
> 6.5GB mem usage.
>
> care to point out what to look for? i'll send a new one when the usage is
> starting to cause swapping.
>
> thanks,
>
> stijn
>
>
> On 04/02/2014 06:35 PM, Gregory Farnum wrote:
>>
>> Did you see
>> http://ceph.com/docs/master/rados/troubleshooting/memory-profiling?
>> That should have what you need to get started, although you'll also
>> need to learn the basics of using the heap analysis tools elsewhere.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Wed, Apr 2, 2014 at 9:23 AM, Stijn De Weirdt <stijn.dewei...@ugent.be>
>> wrote:
>>>
>>> hi,
>>>
>>>
>>>>> 1) How big and what shape the filesystem is. Do you have some
>>>>> extremely large directory that the MDS keeps trying to load and then
>>>>> dump?
>>>>
>>>>
>>>> anyway to extract this from the mds without having to start it? as it
>>>> was an rsync operation, i can try to locate possible candidates on the
>>>> source filesystem, but what would be considered "large"?
>>>
>>>
>>> total number of files 13M, spread over 800k directories, but it's unclear
>>> how far the sync was at time of failing. i've not found a good way to for
>>> directories with lots of files and/or subdirs.
>>>
>>>
>>>>
>>>>> 2) Use tcmalloc's heap analyzer to see where all the memory is being
>>>>> allocated.
>>>>
>>>>
>>>> we'll giv ethat a try
>>>
>>>
>>> i run ceph-mds with HEAPCHECK=normal (via the init script), but how can
>>> we
>>> "stop" mds without killing it? the heapchecker only seems to dump at the
>>> end
>>> of a run, maybe there's a way to have intermediate dump like valgrind,
>>> but
>>> the documentation is not very helpful.
>>>
>>> stijn
>>>
>>>
>>>>
>>>>> 3) Look through the logs for when the beacon fails (the first of
>>>>> "mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see
>>>>> if there's anything tell-tale going on at the time.
>>>>
>>>>
>>>>
>>>> anything in particular we should be looking for?
>>>>
>>>> the log goes as follows:
>>>> mds starts around 11:43
>>>> ...
>>>>
>>>>> 2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server
>>>>> reconnect_clients -- 1 sessions
>>>>> 2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect by
>>>>> client.4585 10.141.8.199:0/3551 after 17.553854
>>>>> 2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10 reconnect_done
>>>>> 2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10 handle_mds_map i
>>>>> am now mds.0.10
>>>>> 2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10 handle_mds_map
>>>>> state change up:reconnect --> up:rejoin
>>>>> 2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
>>>>> 2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10 rejoin_joint_start
>>>>
>>>>
>>>>
>>>> then lots (4667 lines) of
>>>>>
>>>>>
>>>>> 2014-04-01 11:50:10.237035 7ffebc844700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0
>>>>>
>>>>> l=1 c=0x6513e8840).connect got RESETSESSION but no longer connecting
>>>>
>>>>
>>>>
>>>> with one intermediate
>>>>>
>>>>>
>>>>> 2014-04-01 11:51:50.181354 7ffebcf4b700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1
>>>>>
>>>>> c=0xc77d5ee0).fault
>>>>
>>>>
>>>>
>>>>
>>>> then sudden change
>>>>>
>>>>>
>>>>> 2014-04-01 11:57:30.440554 7ffebcd49700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0
>>>>>
>>>>> l=1 c=0xd99b11e0).connect got RESETSESSION but no longer connecting
>>>>> 2014-04-01 11:57:30.722607 7ffebec68700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
>>>>>
>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER
>>>>> 2014-04-01 11:57:30.722669 7ffebec68700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0
>>>>>
>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER
>>>>> 2014-04-01 11:57:30.722885 7ffebec68700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
>>>>>
>>>>> c=0xc48a3f40).connect got BADAUTHORIZER
>>>>> 2014-04-01 11:57:30.722945 7ffebec68700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1
>>>>>
>>>>> c=0xc48a3f40).connect got BADAUTHORIZER
>>>>
>>>>
>>>>
>>>> followed by lots of
>>>>>
>>>>>
>>>>> 2014-04-01 11:57:30.738562 7ffebec68700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1
>>>>>
>>>>> c=0x10e5d280).fault
>>>>
>>>>
>>>>
>>>> with sporadic
>>>>>
>>>>>
>>>>> 2014-04-01 11:57:32.431219 7ffebeb67700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0
>>>>>
>>>>> l=1 c=0x130590dc0).connect got RESETSESSION but no longer connecting
>>>>
>>>>
>>>>
>>>>
>>>> until the dmup
>>>>>
>>>>>
>>>>> 2014-04-01 11:59:27.612850 7ffebea66700  0 -- 10.141.8.180:6837/55117
>>>>>>>
>>>>>>> 10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1
>>>>>
>>>>> c=0xa7be300).fault
>>>>> 2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function
>>>>> 'void CDir::_omap_fetched(ceph::bufferlist&,
>>>>> std::map<std::basic_string<char, std::char_traits<char>,
>>>>> std::allocator\
>>>>> <char> >, ceph::buffer::list, std::less<std::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> > >,
>>>>> std::allocator<std::pair<const std::basic_string<char, std::char_trait\
>>>>> s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const
>>>>> std::string&, int)' thread 7ffec89c6700 time 2014-04-01 11:59:27.620684
>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>>>>
>>>>>   ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
>>>>>   1: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string,
>>>>> ceph::buffer::list, std::less<std::string>,
>>>>> std::allocator<std::pair<std::string const, ceph::buffer::list> > >&,
>>>>> st\
>>>>> d::string const&, int)+0x4d71) [0x77c3c1]
>>>>>   2: (Context::complete(int)+0x9) [0x56bb79]
>>>>>   3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
>>>>>   4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
>>>>>   5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
>>>>>   6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
>>>>>   7: (DispatchQueue::entry()+0x582) [0x902072]
>>>>>   8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
>>>>>   9: /lib64/libpthread.so.0() [0x34c36079d1]
>>>>>   10: (clone()+0x6d) [0x34c32e8b6d]
>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to interpret this.
>>>>>
>>>>> --- begin dump of recent events ---
>>>>> -10000> 2014-04-01 11:59:27.137779 7ffec89c6700  5 mds.0.10 initiating
>>>>> monitor reconnect; maybe we're not the slow one
>>>>>   -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10 monclient(hunting):
>>>>> _reopen_session rank -1 name
>>>>>   -9998> 2014-04-01 11:59:27.137790 7ffec89c6700  1 --
>>>>> 10.141.8.180:6837/55117 mark_down 0x1326be460 -- 0x2cae5f00
>>>>>   -9997> 2014-04-01 11:59:27.137807 7ffebf672700  2 --
>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57
>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect error 10.141.8.1\
>>>>> 81:6789/0, 104: Connection reset by peer
>>>>>   -9996> 2014-04-01 11:59:27.137819 7ffebf672700  3 --
>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57
>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect fault, but state\
>>>>>   = closed != connecting, stopping
>>>>>   -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10 monclient(hunting):
>>>>> picked mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0
>>>>>   -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10 monclient(hunting):
>>>>> _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>
>>>>>
>>>>> On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
>>>>> <kenneth.waege...@ugent.be> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Message from Gregory Farnum <g...@inktank.com> ---------
>>>>>>      Date: Tue, 1 Apr 2014 09:03:17 -0700
>>>>>>      From: Gregory Farnum <g...@inktank.com>
>>>>>>
>>>>>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
>>>>>>        To: "Yan, Zheng" <uker...@gmail.com>
>>>>>>        Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users
>>>>>> <ceph-users@lists.ceph.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
>>>>>>>> <kenneth.waege...@ugent.be> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> After some more searching, I've found that the source of the
>>>>>>>>> problem is
>>>>>>>>> with
>>>>>>>>> the mds and not the mon.. The mds crashes, generates a core dump
>>>>>>>>> that
>>>>>>>>> eats
>>>>>>>>> the local space, and in turn the monitor (because of leveldb)
>>>>>>>>> crashes.
>>>>>>>>>
>>>>>>>>> The error in the mds log of one host:
>>>>>>>>>
>>>>>>>>> 2014-04-01 15:46:34.414615 7f870e319700  0 --
>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0
>>>>>>>>> l=1
>>>>>>>>> c=0x147ac780).connect got RESETSESSION but no longer connecting
>>>>>>>>> 2014-04-01 15:46:34.438792 7f871194f700  0 --
>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
>>>>>>>>> c=0x1fd44360).connect got RESETSESSION but no longer connecting
>>>>>>>>> 2014-04-01 15:46:34.439028 7f870e319700  0 --
>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0
>>>>>>>>> l=1
>>>>>>>>> c=0x1fd43de0).connect got RESETSESSION but no longer connecting
>>>>>>>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function
>>>>>>>>> 'void
>>>>>>>>> CDir::_omap_fetched(ceph::bufferlist&,
>>>>>>>>> std::map<std::basic_string<char,
>>>>>>>>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list,
>>>>>>>>> std::less<std::basic_string<char, std::char_traits<char>,
>>>>>>>>> std::allocator<char> > >, std::allocator<std::pair<const
>>>>>>>>> std::basic_string<char, std::char_traits<char>,
>>>>>>>>> std::allocator<char> >,
>>>>>>>>> ceph::buffer::list> > >&, const std::string&, int)' thread
>>>>>>>>> 7f871b7ae700
>>>>>>>>> time
>>>>>>>>> 2014-04-01 15:46:34.448320
>>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>>>>>>>>
>>>>>>>>
>>>>>>>> could you use gdb to check what is value of variable 'r' .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If you look at the crash dump log you can see the return value in the
>>>>>>> osd_op_reply message:
>>>>>>> -1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
>>>>>>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
>>>>>>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
>>>>>>> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
>>>>>>> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
>>>>>>> 0x2cff7aa80 con 0x37ea3c0
>>>>>>>
>>>>>>> -108, which is ESHUTDOWN, but we also use it (via the 108 constant, I
>>>>>>> think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
>>>>>>> So it looks like this is itself actually a symptom of another problem
>>>>>>> that is causing the MDS to get timed out on the monitor. If a core
>>>>>>> dump is "eating the local space", maybe the MDS is stuck in an
>>>>>>> infinite allocation loop of some kind? How big are your disks,
>>>>>>> Kenneth? Do you have any information on how much CPU/memory the MDS
>>>>>>> was using before this?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I monitored the mds process after restart:
>>>>>>
>>>>>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>> 19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54 ceph-mds
>>>>>> 19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39 ceph-mds
>>>>>> 19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47 ceph-mds
>>>>>> 19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01 ceph-mds
>>>>>> 19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83 ceph-mds
>>>>>> 19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66 ceph-mds
>>>>>> 19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
>>>>>> 19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
>>>>>> 19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
>>>>>> 19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46 ceph-mds
>>>>>>
>>>>>> Then it crashes. I changed the core dump location out of the root fs,
>>>>>> the
>>>>>> core dump is indeed about 26G
>>>>>>
>>>>>> My disks:
>>>>>>
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> /dev/sda2       9.9G  2.9G  6.5G  31% /
>>>>>> tmpfs            16G     0   16G   0% /dev/shm
>>>>>> /dev/sda1       248M   53M  183M  23% /boot
>>>>>> /dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
>>>>>> /dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
>>>>>> /dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
>>>>>> /dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
>>>>>> /dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
>>>>>> /dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
>>>>>> /dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
>>>>>> /dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
>>>>>> /dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
>>>>>> /dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
>>>>>> /dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
>>>>>> /dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
>>>>>> /dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
>>>>>> /dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -Greg
>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- End message from Gregory Farnum <g...@inktank.com> -----
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Met vriendelijke groeten,
>>>>>> Kenneth Waegeman
>>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to