Hi, thanks for sharing the logs. I see lines: Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.946+0000 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x100013897dd Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000139903b Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10001399315
this seems to be hinting that the metadata recovery was inconsistent, this doesn't directly translate into the root cause but this _may_ have lead to MDS sending wrong input (possibly the file layout) while calling probe() in RecoveryQueue::_start making use of the "projected inode" that passes on the inode number, the file layout and the max file size to filer.probe() which is calling the Stripper::file_to_extents to calculate the extents and their compuated sizes should've been more than what the object size is returned by objecter->stat(). If you can set the debug mds to 20, I think it should reveal what the values of the object extent was calculated by Stripper code and the file size returned by the OSD. *Dhairya Parmar* Software Engineer, CephFS On Sat, Jan 31, 2026 at 11:40 PM Андрей Муханов <[email protected]> wrote: > Hi Dhairya, > The debug logs were included in my original email, but they seem to have > disappeared from the mailing list. Please find them here. > > To my understanding, yes, this appears to be the same issue described in > the link you provided, although the root cause may be different. > > Jan 21 19:52:13 s11 systemd[1]: Started Ceph > mds.production_cephfs.s11.srybkc for 486d3212-1558-11ec-913a-ac1f6b95d690. > Jan 21 19:52:17 s11 systemd-journald[1217]: Suppressed 705 messages from > [email protected]_cephfs.s11.srybkc.service > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000 > 7f6ce964ab00 0 set uid:gid to 167:167 (ceph:ceph) > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000 > 7f6ce964ab00 0 ceph version 18.2.2 > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, > pid 6 > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000 > 7f6ce964ab00 1 main not setting numa affinity > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000 > 7f6ce964ab00 0 pidfile_write: ignore empty --pid-file > Jan 21 19:52:17 s11 bash[3097096]: starting > mds.production_cephfs.s11.srybkc at > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.930+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111785 from mon.2 > Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.930+0000 > 7f6cdbe7c700 5 mds.beacon.production_cephfs.s11.srybkc Sending beacon > up:boot seq 1 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111786 from mon.2 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Monitors have assigned me > to become a standby. > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000 > 7f6cdde80700 5 mds.beacon.production_cephfs.s11.srybkc set_want_state: > up:boot -> up:standby > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.356+0000 > 7f6ce0e86700 5 mds.beacon.production_cephfs.s11.srybkc received beacon > reply up:boot seq 1 rtt 0.425989 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.510+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111787 from mon.2 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.510+0000 > 7f6cdde80700 4 mds.0.purge_queue operator(): data pool 20 not found in > OSDMap > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 4 mds.0.purge_queue operator(): data pool 20 not found in > OSDMap > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 4 mds.0.0 apply_blocklist: killed 0, blocklisted sessions (0 > blocklist entries, 0) > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map i am now mds.0.1111787 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map state change up:standby --> > up:replay > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 5 mds.beacon.production_cephfs.s11.srybkc set_want_state: > up:standby -> up:replay > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 1 mds.0.1111787 replay_start > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cdde80700 1 mds.0.1111787 waiting for osdmap 5831472 (which blocklists > prior instance) > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000 > 7f6cd9e78700 2 mds.0.cache Memory usage: total 326984, rss 32376, heap > 182580, baseline 182580, 0 / 0 inodes have caps, 0 caps, 0 caps per inode > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: opening inotable > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: opening sessionmap > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: opening mds log > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 5 mds.0.log open discovering log bounds > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: opening purge queue (async) > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 4 mds.0.purge_queue open: opening > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: loading open file table (async) > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.570+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 0: opening snap table > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.576+0000 > 7f6cd7673700 4 mds.0.log Waiting for journal 0x200 to recover... > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.598+0000 > 7f6cd7673700 4 mds.0.log Journal 0x200 recovered. > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.598+0000 > 7f6cd7673700 4 mds.0.log Recovered journal 0x200 in format 1 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 1: loading/discovering base inodes > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000 > 7f6cd7e74700 0 mds.0.cache creating system inode with ino:0x100 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000 > 7f6cd7e74700 0 mds.0.cache creating system inode with ino:0x1 > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 2: replaying mds log > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd7e74700 2 mds.0.1111787 Booting: 2: waiting for purge queue recovered > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd6671700 1 mds.0.journal EResetJournal > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd6671700 1 mds.0.sessionmap wipe start > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd6671700 1 mds.0.sessionmap wipe result > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000 > 7f6cd6671700 1 mds.0.sessionmap wipe done > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd8e76700 4 mds.0.purge_queue operator(): open complete > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.1111787 Finished replaying journal > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.1111787 making mds journal writeable > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.1111787 wiping out client sessions > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.sessionmap wipe start > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.sessionmap wipe result > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 1 mds.0.sessionmap wipe done > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 2 mds.0.1111787 i am alone, moving to state reconnect > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 3 mds.0.1111787 request_state up:reconnect > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 5 mds.beacon.production_cephfs.s11.srybkc set_want_state: > up:replay -> up:reconnect > Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000 > 7f6cd7e74700 5 mds.beacon.production_cephfs.s11.srybkc Sending beacon > up:reconnect seq 2 > Jan 21 19:52:19 s11 bash[3097096]: debug 2026-01-21T19:52:19.512+0000 > 7f6cd9e78700 2 mds.0.cache Memory usage: total 353748, rss 37372, heap > 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.512+0000 > 7f6cd9e78700 2 mds.0.cache Memory usage: total 353748, rss 37372, heap > 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111788 from mon.2 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map i am now mds.0.1111787 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map state change up:replay --> > up:reconnect > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.0.1111787 reconnect_start > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.0.1111787 reopen_log > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 4 mds.0.1111787 apply_blocklist: killed 0, blocklisted > sessions (42 blocklist entries, 0) > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 1 mds.0.1111787 reconnect_done > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 3 mds.0.1111787 request_state up:rejoin > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 5 mds.beacon.production_cephfs.s11.srybkc set_want_state: > up:reconnect -> up:rejoin > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000 > 7f6cdde80700 5 mds.beacon.production_cephfs.s11.srybkc Sending beacon > up:rejoin seq 3 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.837+0000 > 7f6ce0e86700 5 mds.beacon.production_cephfs.s11.srybkc received beacon > reply up:reconnect seq 2 rtt 2.20094 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000 > 7f6cdde80700 0 mds.0.server ignoring msg from not-open > sessionclient_reconnect(0 caps 0 realms ) v3 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000 > 7f6cdde80700 0 mds.0.server ignoring msg from not-open > sessionclient_reconnect(0 caps 0 realms ) v3 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000 > 7f6ce1687700 0 --1- [v2: > 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1: > 20.2.0.66:0/1664934024 conn(0x564cc1219c00 0x564cc0148000 :6801 s=OPENED > pgs=4344 cs=1 l=0).fault server, going to standby > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000 > 7f6ce0e86700 0 --1- [v2: > 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1: > 20.2.0.65:0/3311648163 conn(0x564cc1200800 0x564cc0148800 :6801 s=OPENED > pgs=380 cs=1 l=0).fault server, going to standby > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.875+0000 > 7f6cdde80700 5 mds.0.server session is closed, dropping > client.192669065:14698917 > Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.875+0000 > 7f6cdde80700 0 mds.0.server ignoring msg from not-open > sessionclient_reconnect(0 caps 0 realms ) v3 > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.512+0000 > 7f6cd9e78700 2 mds.0.cache Memory usage: total 353748, rss 37372, heap > 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111789 from mon.2 > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map i am now mds.0.1111787 > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map state change up:reconnect --> > up:rejoin > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000 > 7f6cdde80700 1 mds.0.1111787 rejoin_start > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000 > 7f6cdde80700 1 mds.0.1111787 rejoin_joint_start > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.940+0000 > 7f6ce0e86700 5 mds.beacon.production_cephfs.s11.srybkc received beacon > reply up:rejoin seq 3 rtt 1.10697 > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.946+0000 > 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory > inode 0x100013897dd > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000 > 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory > inode 0x1000139903b > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000 > 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory > inode 0x10001399315 > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.953+0000 > 7f6cd9677700 1 mds.0.1111787 rejoin_done > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000 > 7f6cd9677700 3 mds.0.1111787 request_state up:active > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000 > 7f6cd9677700 5 mds.beacon.production_cephfs.s11.srybkc set_want_state: > up:rejoin -> up:active > Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000 > 7f6cd9677700 5 mds.beacon.production_cephfs.s11.srybkc Sending beacon > up:active seq 4 > Jan 21 19:52:22 s11 bash[3097096]: debug 2026-01-21T19:52:22.513+0000 > 7f6cd9e78700 2 mds.0.cache Memory usage: total 355796, rss 42628, heap > 207156, baseline 182580, 0 / 445 inodes have caps, 0 caps, 0 caps per inode > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000 > 7f6cdde80700 1 mds.production_cephfs.s11.srybkc Updating MDS map to > version 1111790 from mon.2 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map i am now mds.0.1111787 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000 > 7f6cdde80700 1 mds.0.1111787 handle_mds_map state change up:rejoin --> > up:active > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000 > 7f6cdde80700 1 mds.0.1111787 recovery_done -- successful recovery! > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.132+0000 > 7f6ce1687700 0 --1- [v2: > 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1: > 20.2.0.65:0/3311648163 conn(0x564cc1275800 0x564cc0149000 :6801 > s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 > accept peer reset, then tried to connect to us, replacing > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.132+0000 > 7f6ce0685700 0 --1- [v2: > 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1: > 20.2.0.66:0/1664934024 conn(0x564cc139b000 0x564cc0149800 :6801 > s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 > accept peer reset, then tried to connect to us, replacing > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 1 mds.0.1111787 active_start > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 1 mds.0.1111787 cluster recovered. > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 4 mds.0.1111787 set_osd_epoch_barrier: epoch=5831472 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on > v1:20.2.0.65:0/3311648163 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset > closing connection for session client.186227172 v1:20.2.0.65:0/3311648163 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_reset on v1: > 20.2.0.65:0/3311648163 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on > v1:20.2.0.66:0/1664934024 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset > closing connection for session client.186208578 v1:20.2.0.66:0/1664934024 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_reset on v1: > 20.2.0.66:0/1664934024 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000 > 7f6ce0e86700 5 mds.beacon.production_cephfs.s11.srybkc received beacon > reply up:active seq 4 rtt 1.19397 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on > 20.2.0.106:0/834757530 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000 > 7f6cdde80700 3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset > closing connection for session client.192669065 20.2.0.106:0/834757530 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000 > 7f6cdde80700 5 mds.production_cephfs.s11.srybkc ms_handle_reset on > 20.2.0.106:0/834757530 > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000 > 7f6cd6e72700 5 mds.0.log _submit_thread 4195668~8352 : EUpdate > check_inode_max_size [metablob 0x1, 9 dirs] > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000 > 7f6cd6e72700 5 mds.0.log _submit_thread 4204040~1582 : EUpdate > check_inode_max_size [metablob 0x10001399315, 1 dirs] > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000 > 7f6cd6e72700 5 mds.0.log _submit_thread 4205642~1554 : EUpdate > check_inode_max_size [metablob 0x10001399315, 1 dirs] > Jan 21 19:52:23 s11 bash[3097096]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc: > In function 'bool Filer::_probed(Filer::Probe*, const object_t&, uint64_t, > ceph::real_time, Filer::Probe::unique_lock&)' thread 7f6cd7e74700 time > 2026-01-21T19:52:23.152720+0000 > Jan 21 19:52:23 s11 bash[3097096]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc: > 224: FAILED ceph_assert(probe->known_size[p->oid] <= shouldbe) > Jan 21 19:52:23 s11 bash[3097096]: ceph version 18.2.2 > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > Jan 21 19:52:23 s11 bash[3097096]: 1: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x135) [0x7f6ce8482e15] > Jan 21 19:52:23 s11 bash[3097096]: 2: > /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb] > Jan 21 19:52:23 s11 bash[3097096]: 3: (Filer::_probed(Filer::Probe*, > object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock, > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, > std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91] > Jan 21 19:52:23 s11 bash[3097096]: 4: (Filer::C_Probe::finish(int)+0x91) > [0x564cbe4c2c71] > Jan 21 19:52:23 s11 bash[3097096]: 5: (Context::complete(int)+0xd) > [0x564cbe09c5fd] > Jan 21 19:52:23 s11 bash[3097096]: 6: > (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd] > Jan 21 19:52:23 s11 bash[3097096]: 7: /lib64/libpthread.so.0(+0x81ca) > [0x7f6ce72281ca] > Jan 21 19:52:23 s11 bash[3097096]: 8: clone() > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.152+0000 > 7f6cd7e74700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc: > In function 'bool Filer::_probed(Filer::Probe*, const object_t&, uint64_t, > ceph::real_time, Filer::Probe::unique_lock&)' thread 7f6cd7e74700 time > 2026-01-21T19:52:23.152720+0000 > Jan 21 19:52:23 s11 bash[3097096]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc: > 224: FAILED ceph_assert(probe->known_size[p->oid] <= shouldbe) > Jan 21 19:52:23 s11 bash[3097096]: ceph version 18.2.2 > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > Jan 21 19:52:23 s11 bash[3097096]: 1: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x135) [0x7f6ce8482e15] > Jan 21 19:52:23 s11 bash[3097096]: 2: > /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb] > Jan 21 19:52:23 s11 bash[3097096]: 3: (Filer::_probed(Filer::Probe*, > object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock, > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, > std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91] > Jan 21 19:52:23 s11 bash[3097096]: 4: (Filer::C_Probe::finish(int)+0x91) > [0x564cbe4c2c71] > Jan 21 19:52:23 s11 bash[3097096]: 5: (Context::complete(int)+0xd) > [0x564cbe09c5fd] > Jan 21 19:52:23 s11 bash[3097096]: 6: > (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd] > Jan 21 19:52:23 s11 bash[3097096]: 7: /lib64/libpthread.so.0(+0x81ca) > [0x7f6ce72281ca] > Jan 21 19:52:23 s11 bash[3097096]: 8: clone() > Jan 21 19:52:23 s11 bash[3097096]: *** Caught signal (Aborted) ** > Jan 21 19:52:23 s11 bash[3097096]: in thread 7f6cd7e74700 > thread_name:MR_Finisher > Jan 21 19:52:23 s11 bash[3097096]: ceph version 18.2.2 > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > Jan 21 19:52:23 s11 bash[3097096]: 1: /lib64/libpthread.so.0(+0x12d20) > [0x7f6ce7232d20] > Jan 21 19:52:23 s11 bash[3097096]: 2: gsignal() > Jan 21 19:52:23 s11 bash[3097096]: 3: abort() > Jan 21 19:52:23 s11 bash[3097096]: 4: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x18f) [0x7f6ce8482e6f] > Jan 21 19:52:23 s11 bash[3097096]: 5: > /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb] > Jan 21 19:52:23 s11 bash[3097096]: 6: (Filer::_probed(Filer::Probe*, > object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock, > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, > std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91] > Jan 21 19:52:23 s11 bash[3097096]: 7: (Filer::C_Probe::finish(int)+0x91) > [0x564cbe4c2c71] > Jan 21 19:52:23 s11 bash[3097096]: 8: (Context::complete(int)+0xd) > [0x564cbe09c5fd] > Jan 21 19:52:23 s11 bash[3097096]: 9: > (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd] > Jan 21 19:52:23 s11 bash[3097096]: 10: /lib64/libpthread.so.0(+0x81ca) > [0x7f6ce72281ca] > Jan 21 19:52:23 s11 bash[3097096]: 11: clone() > Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.154+0000 > 7f6cd7e74700 -1 *** Caught signal (Aborted) ** > Jan 21 19:52:23 s11 bash[3097096]: in thread 7f6cd7e74700 > thread_name:MR_Finisher > Jan 21 19:52:23 s11 bash[3097096]: ceph version 18.2.2 > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > Jan 21 19:52:23 s11 bash[3097096]: 1: /lib64/libpthread.so.0(+0x12d20) > [0x7f6ce7232d20] > Jan 21 19:52:23 s11 bash[3097096]: 2: gsignal() > Jan 21 19:52:23 s11 bash[3097096]: 3: abort() > Jan 21 19:52:23 s11 bash[3097096]: 4: (ceph::__ceph_assert_fail(char > const*, char const*, int, char const*)+0x18f) [0x7f6ce8482e6f] > Jan 21 19:52:23 s11 bash[3097096]: 5: > /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb] > Jan 21 19:52:23 s11 bash[3097096]: 6: (Filer::_probed(Filer::Probe*, > object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock, > std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, > std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91] > Jan 21 19:52:23 s11 bash[3097096]: 7: (Filer::C_Probe::finish(int)+0x91) > [0x564cbe4c2c71] > Jan 21 19:52:23 s11 bash[3097096]: 8: (Context::complete(int)+0xd) > [0x564cbe09c5fd] > Jan 21 19:52:23 s11 bash[3097096]: 9: > (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd] > Jan 21 19:52:23 s11 bash[3097096]: 10: /lib64/libpthread.so.0(+0x81ca) > [0x7f6ce72281ca] > Jan 21 19:52:23 s11 bash[3097096]: 11: clone() > Jan 21 19:52:23 s11 bash[3097096]: NOTE: a copy of the executable, or > `objdump -rdS <executable>` is needed to interpret this. > > Kind regards, > Andrei > > чт, 29 янв. 2026 г. в 16:02, Dhairya Parmar <[email protected]>: > >> Hi, Can you share the crash backtrace and mds debug logs? Also is it >> relevant to this issue https://tracker.ceph.com/issues/22550? >> >> *Dhairya Parmar* >> >> Software Engineer, CephFS >> >> >> On Thu, Jan 29, 2026 at 1:05 PM Андрей Муханов via ceph-users < >> [email protected]> wrote: >> >>> _______________________________________________ >>> ceph-users mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >> _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
