[ceph-users] Re: CephFS MDS - Crash restart loop

Michael Götting Fri, 23 May 2025 03:13:26 -0700

Hi Venky,

thanks for your hint at https://tracker.ceph.com/issues/36349. Wefinished the „scan_links“ procedure with thousands of lines in theconsole log.


<---- Example: Start ---->
#/:> cephfs-data-scan scan_links --filesystem cephfs

2025-05-23T03:36:05.228+0200 7f1c46321840 -1 datascan.scan_links: Removeduplicated ino 0x0x20009f7bcee from 0x100215e1933/tmpesktg21e

2025-05-23T03:36:05.228+0200 7f1c46321840 -1 datascan.scan_links: Removeduplicated ino 0x0x2000a0d086a from0x2000a0d0869/part-00000-of-00001.data-00000-of-00001.tempstate181943161020189849

<---- Example: End ---->

This points out that there are many broken things. Hopefully all of thebroken things could be repaired. We did not finished yet, but we areworking on it.

Related to issue, you pointed us on, the bug was first noticed 6 yearsago and you stopped working on it. The reason was „can’t reproduce“. Arethere any hints on what this bug was triggered? Is there may be ancluster configuration that could trigger the CephFS bug behaviour (MDSin stand-by replay, Multi MDS configuration, ...).


Regards,
Michael



Am 22.05.25 um 07:29 schrieb Venky Shankar:

Hi Michael,

On Wed, May 21, 2025 at 10:09 PM Michael Götting
<m...@techfak.uni-bielefeld.de> wrote:


Hi all,

we have the following problem with our CephFS Setup (Ceph version 19.2.2).

Today our two active MDS nodes failed and then the nodes that were in
„stand-by replay“ took over and failed as well.


The CephFS system is equipped as follows:
- 3 monitor nodes
- 4 MDS nodes (2 active/ 2 stand-by)
         - 2 active
         - 2 stand-by replay
- CephFS Pool
         - max_mds = 2
         - 1x Meta data pool
         - 2x data pools (hdd_pool, ssd_pool)



<< ----------------- Ceph fs status output  START----------------- >>

cephfs - 0 clients

RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 failed
1 failed

<< ----------------- Ceph fs status output  END-----------------  >>

To restore the service, we used the documentation

https://docs.ceph.com/en/quincy/cephfs/disaster-recovery-experts/?highlight=mds+repair

we carried out the steps up to and including "MDS table wipes". We did
not carry out the MDS MAP RESET step as we were not sure that we would
then lose all the data from RANK 1. We also carried out the steps under
"Avoiding recovery roadblocks"
https://docs.ceph.com/en/quincy/cephfs/troubleshooting/#avoiding-recovery-roadblocks.


Parameters set of the MDS nodes:

mds advanced mds_abort_on_newly_corrupt_dentry false
mds advanced mds_bal_interval 0
mds basic mds_cache_memory_limit 274877906944
mds advanced mds_cache_trim_threshold 524288
mds advanced mds_go_bad_corrupt_dentry false
mds advanced mds_heartbeat_grace 3600. 000000
mds advanced mds_min_caps_working_set 60000
mds advanced mds_oft_prefetch_dirfrags false *


After trying the recovery steps (truncating the journal) the MDS daemons
are in a crash -> restart loop behavior.


<< ----------------- Example log file mds-1:  START ----------------- >>

-14> 2025-05-21T17:49:27.003+0200 7f36cfc7e640  1 mds.0.42052 active_start
     -13> 2025-05-21T17:49:27.003+0200 7f36d2c84640 10 monclient:
get_auth_request con 0x559a76e8a400 auth_method 0
     -12> 2025-05-21T17:49:27.003+0200 7f36d2483640 10 monclient:
get_auth_request con 0x559a7e041400 auth_method 0
     -11> 2025-05-21T17:49:27.003+0200 7f36d3485640 10 monclient:
get_auth_request con 0x559a76e8b000 auth_method 0
     -10> 2025-05-21T17:49:27.003+0200 7f36d3485640 10 monclient:
get_auth_request con 0x559ab89cb800 auth_method 0
      -9> 2025-05-21T17:49:27.003+0200 7f36d2c84640 10 monclient:
get_auth_request con 0x559a7bcc6400 auth_method 0
      -8> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  1 mds.0.42052
cluster recovered.
      -7> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  4 mds.0.42052
set_osd_epoch_barrier: epoch=492573
      -6> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  5 quiesce.mds.0
<quiesce_cluster_update> epoch:42055 me:7764062 leader:7764062
members:7764062
      -5> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  5 quiesce.mgr.0
<update_membership> starting the db mgr thread at epoch: 42055
      -4> 2025-05-21T17:49:27.015+0200 7f36c5c6a640  5 quiesce.mgr.0
<quiesce_db_thread_main> Entering the main thread
      -3> 2025-05-21T17:49:27.015+0200 7f36c5c6a640  5 quiesce.mgr.0
<membership_upkeep> a reset of the db has been requested
      -2> 2025-05-21T17:49:27.015+0200 7f36c9471640 -1
mds.0.cache.den(0x1 techfak) newly corrupt dentry to be committed:
[dentry #0x1/techfak [c,head] auth (dversion lock) pv=0 v=52947746
ino=0x1000a58d072 state=1073741824 | inodepin=1 0x559a755b2c80]
      -1> 2025-05-21T17:49:27.015+0200 7f36c9471640 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.2/rpm/el9/BUILD/ceph-19.2.2/src/mds/MDCache.cc:
In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*,
CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f36c9471640
time 2025-05-21T17:49:27.020101+0200
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.2/rpm/el9/BUILD/ceph-19.2.2/src/mds/MDCache.cc:
1687: FAILED ceph_assert(follows >= realm->get_newest_seq())


You are running into

         https://tracker.ceph.com/issues/36349

which got closed since it wasn't reproducible and there wasn't any
more debug information to make progress. To recover from this
situation, please refer here

         https://tracker.ceph.com/issues/36349#note-5


   ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
(stable)
   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x121) [0x7f36d5709cf9]
   2: /usr/lib64/ceph/libceph-common.so.2(+0x182eb8) [0x7f36d5709eb8]
   3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t, CInode**, CDentry::linkage_t*)+0xac3) [0x559a51ca4583]
   4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
snapid_t)+0xbd) [0x559a51ca50cd]
   5:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xe71) [0x559a51cab8f1]
   6: (Locker::check_inode_max_size(CInode*, bool, unsigned long,
unsigned long, utime_t)+0x473) [0x559a51d55a33]
   7: (RecoveryQueue::_recovered(CInode*, int, unsigned long,
utime_t)+0x390) [0x559a51d2e750]
   8: (MDSContext::complete(int)+0x5c) [0x559a51e4617c]
   9: (MDSIOContextBase::complete(int)+0x34c) [0x559a51e4884c]
   10: /usr/bin/ceph-mds(+0x4f5970) [0x559a51eed970]
   11: /usr/bin/ceph-mds(+0x160f0d) [0x559a51b58f0d]
   12: (Finisher::finisher_thread_entry()+0x17d) [0x7f36d57c885d]
   13: /lib64/libc.so.6(+0x8a0ca) [0x7f36d50a30ca]
   14: /lib64/libc.so.6(+0x10f150) [0x7f36d5128150]

       0> 2025-05-21T17:49:27.019+0200 7f36c9471640 -1 *** Caught signal
(Aborted) **
   in thread 7f36c9471640 thread_name:

   ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
(stable)
   1: /lib64/libc.so.6(+0x3ebf0) [0x7f36d5057bf0]
   2: /lib64/libc.so.6(+0x8be0c) [0x7f36d50a4e0c]
   3: raise()
   4: abort()
   5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x17b) [0x7f36d5709d53]
   6: /usr/lib64/ceph/libceph-common.so.2(+0x182eb8) [0x7f36d5709eb8]
   7: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
snapid_t, CInode**, CDentry::linkage_t*)+0xac3) [0x559a51ca4583]
   8: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
snapid_t)+0xbd) [0x559a51ca50cd]
   9:
(MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xe71) [0x559a51cab8f1]
   10: (Locker::check_inode_max_size(CInode*, bool, unsigned long,
unsigned long, utime_t)+0x473) [0x559a51d55a33]
   11: (RecoveryQueue::_recovered(CInode*, int, unsigned long,
utime_t)+0x390) [0x559a51d2e750]
   12: (MDSContext::complete(int)+0x5c) [0x559a51e4617c]
   13: (MDSIOContextBase::complete(int)+0x34c) [0x559a51e4884c]
   14: /usr/bin/ceph-mds(+0x4f5970) [0x559a51eed970]
   15: /usr/bin/ceph-mds(+0x160f0d) [0x559a51b58f0d]
   16: (Finisher::finisher_thread_entry()+0x17d) [0x7f36d57c885d]
   17: /lib64/libc.so.6(+0x8a0ca) [0x7f36d50a30ca]
   18: /lib64/libc.so.6(+0x10f150) [0x7f36d5128150]
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

<< ----------------- Example log file mds-1:  END -----------------  >>


<< ----------------- Ceph fs fs dump output  START----------------- >>

e41371
btime 2025-05-21T16:19:27:085643+0200
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 3

Filesystem 'cephfs' (3)
fs_name cephfs
epoch   41370
flags   73 allow_snaps allow_multimds_snaps allow_standby_replay
refuse_client_session
created 2024-03-31T23:36:25.302389+0200
modified        2025-05-21T16:19:09.237977+0200
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
max_xattr_size  65536
required_client_features        {}
last_failure    0
last_failure_osd_epoch  492429
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds
uses inline data,8=no anchor table,9=file layout v2,10=snaprealm
v2,11=minor log segments,12=quiesce subvolumes}
max_mds 2
in      0,1
up      {1=7761302}
failed  0
damaged
stopped
data_pools      [5,3]
metadata_pool   4
inline_data     disabled
balancer
bal_rank_mask   -1
standby_count_wanted    2
qdb_cluster     leader: 7761302 members: 7761302
[mds.mds-1{1:7761302} state up:active seq 5 addr
[v2:[2001:638:504:2011:9:3:1:1]:6800/2463826788,v1:[2001:638:504:2011:9:3:1:1]:6801/2463826788]
compat {c=[1],r=[1],i=[1fff]}]


Standby daemons:

[mds.mds-2{-1:7771506} state up:standby seq 1 addr
[v2:[2001:638:504:2011:6:3:2:2]:6800/2809657192,v1:[2001:638:504:2011:6:3:2:2]:6801/2809657192]
compat {c=[1],r=[1],i=[1fff]}]
dumped fsmap epoch 41371

<< ----------------- Ceph fs fs dump output  END———————— >>

But to be honest, out of all those things we tried, I don't know what to
provide exactly. We can provide much more but ...

We really need the service back online, so help will be very much
appreciated.


Regards,
Michael

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS MDS - Crash restart loop

Reply via email to