Last week I asked about a rogue inode that was causing ceph-mds to segfault during replay. We didn't get any suggestions from this list, so we have been familiarizing ourselves with the ceph source code, and have added the following patch:
--- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -736,6 +736,13 @@ CDir *CInode::get_approx_dirfrag(frag_t fg) CDir *CInode::get_or_open_dirfrag(MDCache *mdcache, frag_t fg) { + if (!is_dir()) { + ostringstream oss; + JSONFormatter f(true); + dump(&f, DUMP_PATH | DUMP_INODE_STORE_BASE | DUMP_MDS_CACHE_OBJECT | DUMP_LOCKS | DUMP_STATE | DUMP_CAPS | DUMP_DIRFRAGS); + f.flush(oss); + dout(0) << oss.str() << dendl; + } ceph_assert(is_dir()); // have it? This has given us a culprit: -2> 2019-10-18 16:19:06.934 7faefa470700 0 mds.0.cache.ino(0x10000995e63) "/unimportant/path/we/can/tolerate/losing/compat.py"10995216789470"2018-03-24 03:18:17.621969""2018-03-24 03:18:17.620969"3318855521001{ "dir_hash": 0 } { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 1, "pool_ns": "" } [] 3411844674407370955161500"2015-01-27 16:01:52.467669""2018-03-24 03:18:17.621969"21-1[] { "version": 0, "mtime": "0.000000", "num_files": 0, "num_subdirs": 0 } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.000000" } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.000000" } 2540123""""[] { "splits": [] } true{ "replicas": {} } { "authority": [ 0, -2 ], "replica_nonce": 0 } 0falsefalse{} 0{ "gather_set": [], "state": "lock", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 0, "xlock_by": {} } {} {} {} {} {} {} {} {} {} [ "auth" ] [] -1-1[] [] -1> 2019-10-18 16:19:06.964 7faefa470700 -1 /opt/app-root/src/ceph/src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7faefa470700 time 2019-10-18 16:19:06.934662 /opt/app-root/src/ceph/src/mds/CInode.cc: 746: FAILED ceph_assert(is_dir()) ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1aa) [0x7faf0a9ce39e] 2: (()+0x12a8620) [0x7faf0a9ce620] 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x253) [0x557562a4b1ad] 4: (OpenFileTable::_prefetch_dirfrags()+0x4db) [0x557562b63d63] 5: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x16a) [0x557562b63720] 6: (C_OFT_OpenInoFinish::finish(int)+0x2d) [0x557562b67699] 7: (Context::complete(int)+0x27) [0x557562657fbf] 8: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x2c8) [0x557562660e36] 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x185) [0x557562844c4d] 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v14_2_0::list&, int)+0xbbf) [0x557562842785] 12: (C_IO_MDC_OpenInoBacktraceFetched::finish(int)+0x37) [0x557562886a31] 13: (Context::complete(int)+0x27) [0x557562657fbf] 14: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 15: (MDSIOContextBase::complete(int)+0x345) [0x557562b0522d] 16: (Finisher::finisher_thread_entry()+0x38b) [0x7faf0a9033e1] 17: (Finisher::FinisherThread::entry()+0x1c) [0x5575626a2772] 18: (Thread::entry_wrapper()+0x78) [0x7faf0a97203c] 19: (Thread::_entry_func(void*)+0x18) [0x7faf0a971fba] 20: (()+0x7dd5) [0x7faf07844dd5] 21: (clone()+0x6d) [0x7faf064f502d] I tried removing it, but it does not show up in the omapkeys for that inode: lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 10000995e63.00000000 __about__.py_head __init__.py_head __pycache___head _compat.py_head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 10000995e63.00000000 _compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 10000995e63.00000000 compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 10000995e63.00000000 file-does-not-exist_head lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 10000995e63.00000000 __about__.py_head __init__.py_head __pycache___head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head Predictably, this did nothing to solve our problem, and ceph-mds is still dying during startup. Any suggestions? Neale Pickett <ne...@lanl.gov> A-4: Advanced Research in Cyber Systems Los Alamos National Laboratory
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com