Hi Xiubo,

On 8. 05. 24 09:53, Xiubo Li wrote:
Hi Dejan,

This is a known issue and please see https://tracker.ceph.com/issues/61009.

For the workaround please see https://tracker.ceph.com/issues/61009#note-26.

Thank you for the links. Unfortunately I'm not sure I understand the workaround: the clients should be mounted without nowsync, however, the clients don't get to the point of mounting as mds is not available yet as it is doing replay. Rebooting clients does not seem to help as they are still in clients list (from "ceph tell mds.1 client ls").

Thanks,
Dejan

Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:
Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
     "assert_condition": "p->first <= start",
     "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",      "assert_func": "void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]",
     "assert_line": 568,
     "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",
     "assert_thread_name": "md_log_replay",
     "backtrace": [
         "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
         "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
         "raise()",
         "abort()",
         "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fcdb83610ff]",          "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]",
         "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
         "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
         "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",
         "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
         "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
         "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
         "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
         "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
     ],
     "ceph_version": "18.2.2",
     "crash_id": "2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
     "entity_name": "mds.spod19",
     "os_id": "almalinux",
     "os_name": "AlmaLinux",
     "os_version": "9.3 (Shamrock Pampas Cat)",
     "os_version_id": "9.3",
     "process_name": "ceph-mds",
     "stack_sig": "3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
     "timestamp": "2024-05-07T22:26:22.050652Z",
     "utsname_hostname": "spod19.ijs.si",
     "utsname_machine": "x86_64",
     "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
     "utsname_sysname": "Linux",
     "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023"
}


Cheers,
Dejan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to