[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

Dejan Lesjak Wed, 08 May 2024 02:37:55 -0700

Hi Xiubo,

On 8. 05. 24 09:53, Xiubo Li wrote:

Hi Dejan,
This is a known issue and please see https://tracker.ceph.com/issues/61009.
For the workaround please seehttps://tracker.ceph.com/issues/61009#note-26.

Thank you for the links. Unfortunately I'm not sure I understand theworkaround: the clients should be mounted without nowsync, however, theclients don't get to the point of mounting as mds is not available yetas it is doing replay.Rebooting clients does not seem to help as they are still in clientslist (from "ceph tell mds.1 client ls").


Thanks,
Dejan

Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:
Hello,
We have cephfs with two active MDS. Currently rank 1 is repeatedlycrashing with FAILED ceph_assert(p->first <= start) in md_log_replaythread. Is there any way to work around this and get to accesible filesystem or should we start with disaster recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
     "assert_condition": "p->first <= start",
"assert_file":"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h", "assert_func": "void interval_set<T, C>::erase(T, T,std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]",
     "assert_line": 568,
"assert_msg":"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",
     "assert_thread_name": "md_log_replay",
     "backtrace": [
         "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
         "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
         "raise()",
         "abort()",
"(ceph::__ceph_assert_fail(char const*, char const*, int,char const*)+0x188) [0x7fcdb83610ff]", "/usr/lib64/ceph/libceph-common.so.2(+0x161263)[0x7fcdb8361263]",
         "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
         "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
"(EMetaBlob::replay(MDSRank*, LogSegment*, int,MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",
         "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
         "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
         "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
         "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
         "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
     ],
     "ceph_version": "18.2.2",
"crash_id":"2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
     "entity_name": "mds.spod19",
     "os_id": "almalinux",
     "os_name": "AlmaLinux",
     "os_version": "9.3 (Shamrock Pampas Cat)",
     "os_version_id": "9.3",
     "process_name": "ceph-mds",
"stack_sig":"3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
     "timestamp": "2024-05-07T22:26:22.050652Z",
     "utsname_hostname": "spod19.ijs.si",
     "utsname_machine": "x86_64",
     "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
     "utsname_sysname": "Linux",
"utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST2023"
}


Cheers,
Dejan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

Reply via email to