I'm testing ceph for a while with a 4 node cluster(1 mon, 1 mds, and 2 osds), each installed ceph 0.56.2.
Today I ran into a mds crash case, on host mds process ceph-mds is terminated by assert(). My questions here are: 1. Reason of mds' crash. 2. How to solve it without mkcephfs. It's reproducible in my environment. Following is information may be related: 1. "ceph -s" output 2. ceph.conf 3. part of ceph-mds.a.log (the whole log file is at http://pastebin.com/NJd0UCfF) 1. "ceph -s" output ============== health HEALTH_WARN mds a is laggy monmap e1: 1 mons at {a=mon.mon.mon.mon:6789/0}, election epoch 1, quorum 0 a osdmap e220: 2 osds: 2 up, 2 in pgmap v3614: 576 pgs: 576 active+clean; 6618 KB data, 162 MB used, 4209 MB / 4606 MB avail mdsmap e860: 1/1/1 up {0=a=up:active(laggy or crashed)} 2. ceph.conf ========= [global] auth supported = none auth cluster required = none auth service required = none auth client required = none debug mds = 20 [mon] mon data = /usr/local/etc/ceph/mon.$id [mon.a] host = mon mon addr = xx.xx.xx.xx:6789 [mds] [mds.a] host = mds [osd] osd data = /ceph/data osd journal size = 128 filestore xattr use omap = true [osd.0] host = osd0 [osd.1] host = osd1 3. part of ceph-mds.a.log ================== 2013-04-09 02:22:58.577485 7f587b640700 1 mds.0.35 handle_mds_map i am now mds.0.35 2013-04-09 02:22:58.577489 7f587b640700 1 mds.0.35 handle_mds_map state change up:rejoin --> up:active 2013-04-09 02:22:58.577494 7f587b640700 1 mds.0.35 recovery_done -- successful recovery! 2013-04-09 02:22:58.577507 7f587b640700 7 mds.0.tableserver(anchortable) finish_recovery 2013-04-09 02:22:58.577515 7f587b640700 7 mds.0.tableserver(snaptable) finish_recovery 2013-04-09 02:22:58.577521 7f587b640700 7 mds.0.tableclient(anchortable) finish_recovery 2013-04-09 02:22:58.577525 7f587b640700 7 mds.0.tableclient(snaptable) finish_recovery 2013-04-09 02:22:58.577529 7f587b640700 10 mds.0.cache start_recovered_truncates 2013-04-09 02:22:58.577533 7f587b640700 10 mds.0.cache do_file_recover 0 queued, 0 recovering 2013-04-09 02:22:58.577541 7f587b640700 10 mds.0.cache reissue_all_caps 2013-04-09 02:22:58.581855 7f587b640700 -1 mds/MDCache.cc: In function 'void MDCache::populate_mydir()' thread 7f587b640700 time 2013-04-09 02:22:58.577558 mds/MDCache.cc: 579: FAILED assert(mydir) ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: (MDCache::populate_mydir()+0xbc5) [0x5f0125] 2: (MDS::recovery_done()+0xde) [0x4ed12e] 3: (MDS::handle_mds_map(MMDSMap*)+0x39c8) [0x4fff28] 4: (MDS::handle_core_message(Message*)+0xb4b) [0x50596b] 5: (MDS::_dispatch(Message*)+0x2f) [0x505a9f] 6: (MDS::ms_dispatch(Message*)+0x23b) [0x50759b] 7: (Messenger::ms_deliver_dispatch(Message*)+0x66) [0x872a26] 8: (DispatchQueue::entry()+0x32a) [0x87093a] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ee7cd] 10: (()+0x6a3f) [0x7f587f465a3f] 11: (clone()+0x6d) [0x7f587df1967d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events ---
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com