Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

Yan, Zheng Tue, 15 Apr 2014 20:31:21 -0700

On Tue, Apr 15, 2014 at 9:49 PM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:
>
> ----- Message from "Yan, Zheng" <uker...@gmail.com> ---------
>    Date: Tue, 15 Apr 2014 21:24:04 +0800
>    From: "Yan, Zheng" <uker...@gmail.com>
>
> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
>      To: Kenneth Waegeman <kenneth.waege...@ugent.be>
>      Cc: ceph-users@lists.ceph.com
>
>
>
>> On Tue, Apr 15, 2014 at 8:55 PM, Kenneth Waegeman
>> <kenneth.waege...@ugent.be> wrote:
>>>
>>> Hi,
>>>
>>> We are back to our MDS problem.
>>>
>>> This time we used the kernel client to connect to the cluster. We started
>>> again filling the cluster with rsync, and got problems after about 48h.
>>> The
>>> MDS crashed again with core dump like before.
>>> I removed the client connection, and restarted the MDS's. The first MDS
>>> got
>>> killed by oom-killer when it used 8G of RAM. The backup MDS took over,
>>> and I
>>> stopped the OSDs(with noout set) on that host so there was more RAM for
>>> testing the MDS. The MDS started again to fill the memory, this time to
>>> 30G.
>>> (We have 32G)
>>> Meanwhile, I saw some interesting rados statistics:
>>> The metadata pool was heavily decreasing in KB, and increasing in object
>>> count.
>>>
>>> pool name       category                 KB      objects       clones
>>> degraded      unfound           rd        rd KB           wr        wr KB
>>> data            -                17707360279     11641299            0
>>> 11667979           0        70364            0     15904215  17711309640
>>> metadata        -                   95833067       405395            0
>>> 484371           0       273320    925385352     12283908    271184622
>>>
>>> Tue Apr 15 12:55:34 2014
>>>
>>> pool name       category                 KB      objects       clones
>>> degraded      unfound           rd        rd KB           wr        wr KB
>>> data            -                17707360279     11667145            0
>>> 11719671           0        95576            0     17619242  17711309640
>>> metadata        -                   49677494       494630            0
>>> 662841           0       294946   1013866123     12748893    271914066
>>>
>>>
>>> After oom-killer killed ceph-mds again:
>>>
>>> Out of memory: Kill process 40776 (ceph-mds) score 808 or sacrifice child
>>> Killed process 40776, UID 0, (ceph-mds) total-vm:31382436kB,
>>> anon-rss:29960904kB, file-rss:20kB python[42731] trap invalid opcode
>>> ip:7f62ddd8ffc1 sp:7f62ccef6488 error:0
>>>
>>>
>>> this reduction was still going on (no clients connected):
>>>
>>> Tue Apr 15 13:20:24 2014
>>>
>>> pool name       category                 KB      objects       clones
>>> degraded      unfound           rd        rd KB           wr        wr KB
>>> data            -                17707360279     11686446            0
>>> 11758273           0        95576            0     18447710  17711309640
>>> metadata        -                   26526902       523234            0
>>> 720049           0       294946   1013866123     12886549    271914355
>>>
>>> Tue Apr 15 13:34:49 2014
>>>
>>> pool name       category                 KB      objects       clones
>>> degraded      unfound           rd        rd KB           wr        wr KB
>>> data            -                17707360279     11707425            0
>>> 11800231           0        95576            0     18989100  17711309640
>>> metadata        -                   12612790       543214            0
>>> 760009           0       294946   1013866123     12977285    271914524
>>>
>>>
>>>
>>> The MDS didn't crash anymore and stopped increasing RAM at 20G
>>> Until it eventually stabilized:
>>>
>>>
>>> Tue Apr 15 13:47:35 2014
>>>
>>> pool name       category                 KB      objects       clones
>>> degraded      unfound           rd        rd KB           wr        wr KB
>>> data            -                17707360279     11720750            0
>>> 11826881           0        95576            0     19346286  17711309640
>>> metadata        -                     126985       579061            0
>>> 831703           0       294946   1013866123     13146061    272057126
>>>
>>>
>>> The MDS is now only using <200m and idle
>>>
>>> Does this looks familiar to someone?
>>>
>>> It looks like the MDS is unable to process the requests, and therefore
>>> tried
>>> to allocate more memory until it resolved?
>>>
>>>
>>
>> Looks like the MDS journal becomes too large. So far there is no
>> mechanism to throttle client request when MDS can't trim log fast
>> enough.  The fix should be easy.
>
>
> What do you mean by the MDS journal? Where can I find this journal?
> Can a better CPU solve the slow trimming? ( Now 2 hexacore AMD Opteron 4334)
>


MDS uses journal to record recent metadata update. the journal is
stored in metadata pool
(object name 200.*). The speed of trimming log is limited by OSD
performance. Better CPU
does not help.

Please try 
https://github.com/ceph/ceph/commit/6dfed2693e9002dbaf82b3dc1a637e1c53878fe3
I hope it can solve this issue.

Regards
Yan, Zheng

> Thanks!
>
>>
>> Regards
>> Yan, Zheng
>>
>>> Thanks!
>>>
>>> Kenneth
>>>
>>> ----- Message from Stijn De Weirdt <stijn.dewei...@ugent.be> ---------
>>>    Date: Fri, 04 Apr 2014 20:31:34 +0200
>>>    From: Stijn De Weirdt <stijn.dewei...@ugent.be>
>>>
>>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
>>>      To: "Yan, Zheng" <uker...@gmail.com>
>>>      Cc: ceph-users@lists.ceph.com
>>>
>>>
>>>
>>>> hi yan,
>>>>
>>>> (taking the list in CC)
>>>>
>>>> On 04/04/2014 04:44 PM, Yan, Zheng wrote:
>>>>>
>>>>>
>>>>> On Thu, Apr 3, 2014 at 2:52 PM, Stijn De Weirdt
>>>>> <stijn.dewei...@ugent.be>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> hi,
>>>>>>
>>>>>> latest pprof output attached.
>>>>>>
>>>>>> this is no kernel client, this is ceph-fuse on EL6. starting the mds
>>>>>> without
>>>>>> any ceph-fuse mounts works without issue. mounting ceph-fuse
>>>>>> afterwards
>>>>>> also
>>>>>> works fine. simple filesystem operations work as expected.
>>>>>>
>>>>>> we'll check the state of the fuse mount via fusectl once we've
>>>>>> reproduced
>>>>>> the issue.
>>>>>
>>>>>
>>>>>
>>>>> One dark side of fuse client is that there is no way to purge inodes
>>>>> in kernel inode caches. Each cached inode in kernel pins corresponding
>>>>> inode in ceph-mds. I think that's why ceph-fuse and ceph-mds use so
>>>>> many memory. Could you try executing "echo 3 >
>>>>> /proc/sys/vm/drop_caches" periodically on the ceph-fuse mount machine.
>>>>
>>>>
>>>> thanks a lot for the suggestion. that night very well explain the issue.
>>>> we're currently trying looking into other issues that we tought was
>>>> related
>>>> (extremely low mdtest performance), but we will retry your suggestion
>>>> when
>>>> we start to reproduce the condition
>>>>
>>>>
>>>>
>>>> stijn
>>>>
>>>>>
>>>>> Regards
>>>>> Yan, Zheng
>>>>>
>>>>>>
>>>>>>
>>>>>> stijn
>>>>>>
>>>>>>
>>>>>> On 04/03/2014 02:39 AM, Yan, Zheng wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> which version of kernel client did you use? please send out context
>>>>>>> of
>>>>>>> client node's /sys/kernel/debug/ceph/*/caps when the MDS uses lots
>>>>>>> memory.
>>>>>>>
>>>>>>> Regards
>>>>>>> Yan, Zheng
>>>>>>>
>>>>>>> On Thu, Apr 3, 2014 at 2:58 AM, Stijn De Weirdt
>>>>>>> <stijn.dewei...@ugent.be>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> wow. kudos for integrating this in ceph. more projects should do it
>>>>>>>> like
>>>>>>>> that!
>>>>>>>>
>>>>>>>> anyway, in attachement a gzipped ps file. heap is at 4.4GB, top
>>>>>>>> reports
>>>>>>>> 6.5GB mem usage.
>>>>>>>>
>>>>>>>> care to point out what to look for? i'll send a new one when the
>>>>>>>> usage
>>>>>>>> is
>>>>>>>> starting to cause swapping.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> stijn
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/02/2014 06:35 PM, Gregory Farnum wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Did you see
>>>>>>>>> http://ceph.com/docs/master/rados/troubleshooting/memory-profiling?
>>>>>>>>> That should have what you need to get started, although you'll also
>>>>>>>>> need to learn the basics of using the heap analysis tools
>>>>>>>>> elsewhere.
>>>>>>>>> -Greg
>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Apr 2, 2014 at 9:23 AM, Stijn De Weirdt
>>>>>>>>> <stijn.dewei...@ugent.be>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> 1) How big and what shape the filesystem is. Do you have some
>>>>>>>>>>>> extremely large directory that the MDS keeps trying to load and
>>>>>>>>>>>> then
>>>>>>>>>>>> dump?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> anyway to extract this from the mds without having to start it?
>>>>>>>>>>> as
>>>>>>>>>>> it
>>>>>>>>>>> was an rsync operation, i can try to locate possible candidates
>>>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>> source filesystem, but what would be considered "large"?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> total number of files 13M, spread over 800k directories, but it's
>>>>>>>>>> unclear
>>>>>>>>>> how far the sync was at time of failing. i've not found a good way
>>>>>>>>>> to
>>>>>>>>>> for
>>>>>>>>>> directories with lots of files and/or subdirs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 2) Use tcmalloc's heap analyzer to see where all the memory is
>>>>>>>>>>>> being
>>>>>>>>>>>> allocated.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> we'll giv ethat a try
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> i run ceph-mds with HEAPCHECK=normal (via the init script), but
>>>>>>>>>> how
>>>>>>>>>> can
>>>>>>>>>> we
>>>>>>>>>> "stop" mds without killing it? the heapchecker only seems to dump
>>>>>>>>>> at
>>>>>>>>>> the
>>>>>>>>>> end
>>>>>>>>>> of a run, maybe there's a way to have intermediate dump like
>>>>>>>>>> valgrind,
>>>>>>>>>> but
>>>>>>>>>> the documentation is not very helpful.
>>>>>>>>>>
>>>>>>>>>> stijn
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 3) Look through the logs for when the beacon fails (the first of
>>>>>>>>>>>> "mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and
>>>>>>>>>>>> see
>>>>>>>>>>>> if there's anything tell-tale going on at the time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> anything in particular we should be looking for?
>>>>>>>>>>>
>>>>>>>>>>> the log goes as follows:
>>>>>>>>>>> mds starts around 11:43
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server
>>>>>>>>>>>> reconnect_clients -- 1 sessions
>>>>>>>>>>>> 2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect
>>>>>>>>>>>> by
>>>>>>>>>>>> client.4585 10.141.8.199:0/3551 after 17.553854
>>>>>>>>>>>> 2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10
>>>>>>>>>>>> reconnect_done
>>>>>>>>>>>> 2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10
>>>>>>>>>>>> handle_mds_map
>>>>>>>>>>>> i
>>>>>>>>>>>> am now mds.0.10
>>>>>>>>>>>> 2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10
>>>>>>>>>>>> handle_mds_map
>>>>>>>>>>>> state change up:reconnect --> up:rejoin
>>>>>>>>>>>> 2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
>>>>>>>>>>>> 2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10
>>>>>>>>>>>> rejoin_joint_start
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> then lots (4667 lines) of
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:50:10.237035 7ffebc844700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> l=1 c=0x6513e8840).connect got RESETSESSION but no longer
>>>>>>>>>>>> connecting
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> with one intermediate
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:51:50.181354 7ffebcf4b700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0
>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> c=0xc77d5ee0).fault
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> then sudden change
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:57:30.440554 7ffebcd49700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> l=1 c=0xd99b11e0).connect got RESETSESSION but no longer
>>>>>>>>>>>> connecting
>>>>>>>>>>>> 2014-04-01 11:57:30.722607 7ffebec68700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER
>>>>>>>>>>>> 2014-04-01 11:57:30.722669 7ffebec68700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> l=1 c=0xc48a3f40).connect got BADAUTHORIZER
>>>>>>>>>>>> 2014-04-01 11:57:30.722885 7ffebec68700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> c=0xc48a3f40).connect got BADAUTHORIZER
>>>>>>>>>>>> 2014-04-01 11:57:30.722945 7ffebec68700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> c=0xc48a3f40).connect got BADAUTHORIZER
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> followed by lots of
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:57:30.738562 7ffebec68700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0
>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> c=0x10e5d280).fault
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> with sporadic
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:57:32.431219 7ffebeb67700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0
>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> l=1 c=0x130590dc0).connect got RESETSESSION but no longer
>>>>>>>>>>>> connecting
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> until the dmup
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-04-01 11:59:27.612850 7ffebea66700  0 --
>>>>>>>>>>>> 10.141.8.180:6837/55117
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0
>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> c=0xa7be300).fault
>>>>>>>>>>>> 2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In
>>>>>>>>>>>> function
>>>>>>>>>>>> 'void CDir::_omap_fetched(ceph::bufferlist&,
>>>>>>>>>>>> std::map<std::basic_string<char, std::char_traits<char>,
>>>>>>>>>>>> std::allocator\
>>>>>>>>>>>> <char> >, ceph::buffer::list, std::less<std::basic_string<char,
>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> > >,
>>>>>>>>>>>> std::allocator<std::pair<const std::basic_string<char,
>>>>>>>>>>>> std::char_trait\
>>>>>>>>>>>> s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const
>>>>>>>>>>>> std::string&, int)' thread 7ffec89c6700 time 2014-04-01
>>>>>>>>>>>> 11:59:27.620684
>>>>>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
>>>>>>>>>>>>    1: (CDir::_omap_fetched(ceph::buffer::list&,
>>>>>>>>>>>> std::map<std::string,
>>>>>>>>>>>> ceph::buffer::list, std::less<std::string>,
>>>>>>>>>>>> std::allocator<std::pair<std::string const, ceph::buffer::list>
>>>>>>>>>>>> >
>>>>>>>>>>>> >&,
>>>>>>>>>>>> st\
>>>>>>>>>>>> d::string const&, int)+0x4d71) [0x77c3c1]
>>>>>>>>>>>>    2: (Context::complete(int)+0x9) [0x56bb79]
>>>>>>>>>>>>    3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6)
>>>>>>>>>>>> [0x806dd6]
>>>>>>>>>>>>    4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
>>>>>>>>>>>>    5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
>>>>>>>>>>>>    6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
>>>>>>>>>>>>    7: (DispatchQueue::entry()+0x582) [0x902072]
>>>>>>>>>>>>    8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
>>>>>>>>>>>>    9: /lib64/libpthread.so.0() [0x34c36079d1]
>>>>>>>>>>>>    10: (clone()+0x6d) [0x34c32e8b6d]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS
>>>>>>>>>>>> <executable>`
>>>>>>>>>>>> is
>>>>>>>>>>>> needed to interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>> --- begin dump of recent events ---
>>>>>>>>>>>> -10000> 2014-04-01 11:59:27.137779 7ffec89c6700  5 mds.0.10
>>>>>>>>>>>> initiating
>>>>>>>>>>>> monitor reconnect; maybe we're not the slow one
>>>>>>>>>>>>    -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10
>>>>>>>>>>>> monclient(hunting):
>>>>>>>>>>>> _reopen_session rank -1 name
>>>>>>>>>>>>    -9998> 2014-04-01 11:59:27.137790 7ffec89c6700  1 --
>>>>>>>>>>>> 10.141.8.180:6837/55117 mark_down 0x1326be460 -- 0x2cae5f00
>>>>>>>>>>>>    -9997> 2014-04-01 11:59:27.137807 7ffebf672700  2 --
>>>>>>>>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00
>>>>>>>>>>>> sd=57
>>>>>>>>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect error 10.141.8.1\
>>>>>>>>>>>> 81:6789/0, 104: Connection reset by peer
>>>>>>>>>>>>    -9996> 2014-04-01 11:59:27.137819 7ffebf672700  3 --
>>>>>>>>>>>> 10.141.8.180:6837/55117 >> 10.141.8.181:6789/0 pipe(0x2cae5f00
>>>>>>>>>>>> sd=57
>>>>>>>>>>>> :0 s=4 pgs=0 cs=0 l=1 c=0x1326be460).connect fault, but state\
>>>>>>>>>>>>    = closed != connecting, stopping
>>>>>>>>>>>>    -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10
>>>>>>>>>>>> monclient(hunting):
>>>>>>>>>>>> picked mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0
>>>>>>>>>>>>    -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10
>>>>>>>>>>>> monclient(hunting):
>>>>>>>>>>>> _send_mon_message to mon.ceph003 at 10.141.8.182:6789/0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Greg
>>>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
>>>>>>>>>>>> <kenneth.waege...@ugent.be> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- Message from Gregory Farnum <g...@inktank.com> ---------
>>>>>>>>>>>>>       Date: Tue, 1 Apr 2014 09:03:17 -0700
>>>>>>>>>>>>>       From: Gregory Farnum <g...@inktank.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus
>>>>>>>>>>>>> error)
>>>>>>>>>>>>>         To: "Yan, Zheng" <uker...@gmail.com>
>>>>>>>>>>>>>         Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>,
>>>>>>>>>>>>> ceph-users
>>>>>>>>>>>>> <ceph-users@lists.ceph.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
>>>>>>>>>>>>>>> <kenneth.waege...@ugent.be> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After some more searching, I've found that the source of the
>>>>>>>>>>>>>>>> problem is
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> the mds and not the mon.. The mds crashes, generates a core
>>>>>>>>>>>>>>>> dump
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> eats
>>>>>>>>>>>>>>>> the local space, and in turn the monitor (because of
>>>>>>>>>>>>>>>> leveldb)
>>>>>>>>>>>>>>>> crashes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The error in the mds log of one host:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-04-01 15:46:34.414615 7f870e319700  0 --
>>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0
>>>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>>>>> c=0x147ac780).connect got RESETSESSION but no longer
>>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>> 2014-04-01 15:46:34.438792 7f871194f700  0 --
>>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>>>>>>>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0
>>>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>>>>> c=0x1fd44360).connect got RESETSESSION but no longer
>>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>> 2014-04-01 15:46:34.439028 7f870e319700  0 --
>>>>>>>>>>>>>>>> 10.141.8.180:6836/13152 >>
>>>>>>>>>>>>>>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0
>>>>>>>>>>>>>>>> cs=0
>>>>>>>>>>>>>>>> l=1
>>>>>>>>>>>>>>>> c=0x1fd43de0).connect got RESETSESSION but no longer
>>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>> 'void
>>>>>>>>>>>>>>>> CDir::_omap_fetched(ceph::bufferlist&,
>>>>>>>>>>>>>>>> std::map<std::basic_string<char,
>>>>>>>>>>>>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>>>>>>>>>>>>> ceph::buffer::list,
>>>>>>>>>>>>>>>> std::less<std::basic_string<char, std::char_traits<char>,
>>>>>>>>>>>>>>>> std::allocator<char> > >, std::allocator<std::pair<const
>>>>>>>>>>>>>>>> std::basic_string<char, std::char_traits<char>,
>>>>>>>>>>>>>>>> std::allocator<char> >,
>>>>>>>>>>>>>>>> ceph::buffer::list> > >&, const std::string&, int)' thread
>>>>>>>>>>>>>>>> 7f871b7ae700
>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>> 2014-04-01 15:46:34.448320
>>>>>>>>>>>>>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r ==
>>>>>>>>>>>>>>>> -61)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> could you use gdb to check what is value of variable 'r' .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you look at the crash dump log you can see the return value
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> osd_op_reply message:
>>>>>>>>>>>>>> -1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
>>>>>>>>>>>>>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077
>>>>>>>>>>>>>> ====
>>>>>>>>>>>>>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
>>>>>>>>>>>>>> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
>>>>>>>>>>>>>> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
>>>>>>>>>>>>>> 0x2cff7aa80 con 0x37ea3c0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -108, which is ESHUTDOWN, but we also use it (via the 108
>>>>>>>>>>>>>> constant,
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> think because ESHUTDOWN varies across platforms) as
>>>>>>>>>>>>>> EBLACKLISTED.
>>>>>>>>>>>>>> So it looks like this is itself actually a symptom of another
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> that is causing the MDS to get timed out on the monitor. If a
>>>>>>>>>>>>>> core
>>>>>>>>>>>>>> dump is "eating the local space", maybe the MDS is stuck in an
>>>>>>>>>>>>>> infinite allocation loop of some kind? How big are your disks,
>>>>>>>>>>>>>> Kenneth? Do you have any information on how much CPU/memory
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> MDS
>>>>>>>>>>>>>> was using before this?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I monitored the mds process after restart:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>>>>>>>>> COMMAND
>>>>>>>>>>>>> 19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>> 19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46
>>>>>>>>>>>>> ceph-mds
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then it crashes. I changed the core dump location out of the
>>>>>>>>>>>>> root
>>>>>>>>>>>>> fs,
>>>>>>>>>>>>> the
>>>>>>>>>>>>> core dump is indeed about 26G
>>>>>>>>>>>>>
>>>>>>>>>>>>> My disks:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>>>>> /dev/sda2       9.9G  2.9G  6.5G  31% /
>>>>>>>>>>>>> tmpfs            16G     0   16G   0% /dev/shm
>>>>>>>>>>>>> /dev/sda1       248M   53M  183M  23% /boot
>>>>>>>>>>>>> /dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
>>>>>>>>>>>>> /dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
>>>>>>>>>>>>> /dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
>>>>>>>>>>>>> /dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
>>>>>>>>>>>>> /dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
>>>>>>>>>>>>> /dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
>>>>>>>>>>>>> /dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
>>>>>>>>>>>>> /dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
>>>>>>>>>>>>> /dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
>>>>>>>>>>>>> /dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
>>>>>>>>>>>>> /dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
>>>>>>>>>>>>> /dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
>>>>>>>>>>>>> /dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
>>>>>>>>>>>>> /dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- End message from Gregory Farnum <g...@inktank.com> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Met vriendelijke groeten,
>>>>>>>>>>>>> Kenneth Waegeman
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list
>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
>>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>>> ----- End message from Stijn De Weirdt <stijn.dewei...@ugent.be> -----
>>>
>>>
>>> --
>>>
>>> Met vriendelijke groeten,
>>> Kenneth Waegeman
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ----- End message from "Yan, Zheng" <uker...@gmail.com> -----
>
>
> --
>
> Met vriendelijke groeten,
> Kenneth Waegeman
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)

Reply via email to