[ceph-users] Re: [MDS crash / ceph_assert failure] CephFS MDS crash on Reef 18.2.2 (Filer::_probed assertion)

Dhairya Parmar via ceph-users Sun, 15 Feb 2026 22:21:46 -0800

Hi, thanks for sharing the logs. I see lines:
Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.946+0000
7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
inode 0x100013897dd
Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000
7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
inode 0x1000139903b
Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000
7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
inode 0x10001399315


this seems to be hinting that the metadata recovery was inconsistent, this
doesn't directly translate into the root cause but this _may_ have lead to
MDS sending wrong input (possibly the file layout) while calling probe() in
RecoveryQueue::_start making use of  the "projected inode" that passes on
the inode number, the file layout and the max file size to filer.probe()
which is calling the Stripper::file_to_extents to calculate the extents and
their compuated sizes should've been more than what the object size is
returned by objecter->stat(). If you can set the debug mds to 20, I think
it should reveal what the values of the object extent was calculated by
Stripper code and the file size returned by the OSD.


*Dhairya Parmar*

Software Engineer, CephFS


On Sat, Jan 31, 2026 at 11:40 PM Андрей Муханов <[email protected]>
wrote:

> Hi Dhairya,
> The debug logs were included in my original email, but they seem to have
> disappeared from the mailing list. Please find them here.
>
> To my understanding, yes, this appears to be the same issue described in
> the link you provided, although the root cause may be different.
>
> Jan 21 19:52:13 s11 systemd[1]: Started Ceph
> mds.production_cephfs.s11.srybkc for 486d3212-1558-11ec-913a-ac1f6b95d690.
> Jan 21 19:52:17 s11 systemd-journald[1217]: Suppressed 705 messages from
> [email protected]_cephfs.s11.srybkc.service
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000
> 7f6ce964ab00  0 set uid:gid to 167:167 (ceph:ceph)
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000
> 7f6ce964ab00  0 ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds,
> pid 6
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000
> 7f6ce964ab00  1 main not setting numa affinity
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.798+0000
> 7f6ce964ab00  0 pidfile_write: ignore empty --pid-file
> Jan 21 19:52:17 s11 bash[3097096]: starting
> mds.production_cephfs.s11.srybkc at
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.930+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111785 from mon.2
> Jan 21 19:52:17 s11 bash[3097096]: debug 2026-01-21T19:52:17.930+0000
> 7f6cdbe7c700  5 mds.beacon.production_cephfs.s11.srybkc Sending beacon
> up:boot seq 1
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111786 from mon.2
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Monitors have assigned me
> to become a standby.
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.327+0000
> 7f6cdde80700  5 mds.beacon.production_cephfs.s11.srybkc set_want_state:
> up:boot -> up:standby
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.356+0000
> 7f6ce0e86700  5 mds.beacon.production_cephfs.s11.srybkc received beacon
> reply up:boot seq 1 rtt 0.425989
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.510+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111787 from mon.2
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.510+0000
> 7f6cdde80700  4 mds.0.purge_queue operator():  data pool 20 not found in
> OSDMap
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  4 mds.0.purge_queue operator():  data pool 20 not found in
> OSDMap
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  4 mds.0.0 apply_blocklist: killed 0, blocklisted sessions (0
> blocklist entries, 0)
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map i am now mds.0.1111787
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map state change up:standby -->
> up:replay
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  5 mds.beacon.production_cephfs.s11.srybkc set_want_state:
> up:standby -> up:replay
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  1 mds.0.1111787 replay_start
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cdde80700  1 mds.0.1111787  waiting for osdmap 5831472 (which blocklists
> prior instance)
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.511+0000
> 7f6cd9e78700  2 mds.0.cache Memory usage:  total 326984, rss 32376, heap
> 182580, baseline 182580, 0 / 0 inodes have caps, 0 caps, 0 caps per inode
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: opening inotable
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: opening sessionmap
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: opening mds log
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  5 mds.0.log open discovering log bounds
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: opening purge queue (async)
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  4 mds.0.purge_queue open: opening
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.569+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: loading open file table (async)
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.570+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 0: opening snap table
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.576+0000
> 7f6cd7673700  4 mds.0.log Waiting for journal 0x200 to recover...
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.598+0000
> 7f6cd7673700  4 mds.0.log Journal 0x200 recovered.
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.598+0000
> 7f6cd7673700  4 mds.0.log Recovered journal 0x200 in format 1
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 1: loading/discovering base inodes
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000
> 7f6cd7e74700  0 mds.0.cache creating system inode with ino:0x100
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.599+0000
> 7f6cd7e74700  0 mds.0.cache creating system inode with ino:0x1
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 2: replaying mds log
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd7e74700  2 mds.0.1111787 Booting: 2: waiting for purge queue recovered
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd6671700  1 mds.0.journal EResetJournal
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd6671700  1 mds.0.sessionmap wipe start
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd6671700  1 mds.0.sessionmap wipe result
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.607+0000
> 7f6cd6671700  1 mds.0.sessionmap wipe done
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd8e76700  4 mds.0.purge_queue operator(): open complete
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.1111787 Finished replaying journal
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.1111787 making mds journal writeable
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.1111787 wiping out client sessions
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.sessionmap wipe start
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.sessionmap wipe result
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  1 mds.0.sessionmap wipe done
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  2 mds.0.1111787 i am alone, moving to state reconnect
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  3 mds.0.1111787 request_state up:reconnect
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  5 mds.beacon.production_cephfs.s11.srybkc set_want_state:
> up:replay -> up:reconnect
> Jan 21 19:52:18 s11 bash[3097096]: debug 2026-01-21T19:52:18.636+0000
> 7f6cd7e74700  5 mds.beacon.production_cephfs.s11.srybkc Sending beacon
> up:reconnect seq 2
> Jan 21 19:52:19 s11 bash[3097096]: debug 2026-01-21T19:52:19.512+0000
> 7f6cd9e78700  2 mds.0.cache Memory usage:  total 353748, rss 37372, heap
> 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.512+0000
> 7f6cd9e78700  2 mds.0.cache Memory usage:  total 353748, rss 37372, heap
> 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111788 from mon.2
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map i am now mds.0.1111787
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map state change up:replay -->
> up:reconnect
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.0.1111787 reconnect_start
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.0.1111787 reopen_log
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  4 mds.0.1111787 apply_blocklist: killed 0, blocklisted
> sessions (42 blocklist entries, 0)
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  1 mds.0.1111787 reconnect_done
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  3 mds.0.1111787 request_state up:rejoin
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  5 mds.beacon.production_cephfs.s11.srybkc set_want_state:
> up:reconnect -> up:rejoin
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.833+0000
> 7f6cdde80700  5 mds.beacon.production_cephfs.s11.srybkc Sending beacon
> up:rejoin seq 3
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.837+0000
> 7f6ce0e86700  5 mds.beacon.production_cephfs.s11.srybkc received beacon
> reply up:reconnect seq 2 rtt 2.20094
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000
> 7f6cdde80700  0 mds.0.server  ignoring msg from not-open
> sessionclient_reconnect(0 caps 0 realms ) v3
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000
> 7f6cdde80700  0 mds.0.server  ignoring msg from not-open
> sessionclient_reconnect(0 caps 0 realms ) v3
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000
> 7f6ce1687700  0 --1- [v2:
> 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1:
> 20.2.0.66:0/1664934024 conn(0x564cc1219c00 0x564cc0148000 :6801 s=OPENED
> pgs=4344 cs=1 l=0).fault server, going to standby
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.852+0000
> 7f6ce0e86700  0 --1- [v2:
> 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1:
> 20.2.0.65:0/3311648163 conn(0x564cc1200800 0x564cc0148800 :6801 s=OPENED
> pgs=380 cs=1 l=0).fault server, going to standby
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.875+0000
> 7f6cdde80700  5 mds.0.server session is closed, dropping
> client.192669065:14698917
> Jan 21 19:52:20 s11 bash[3097096]: debug 2026-01-21T19:52:20.875+0000
> 7f6cdde80700  0 mds.0.server  ignoring msg from not-open
> sessionclient_reconnect(0 caps 0 realms ) v3
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.512+0000
> 7f6cd9e78700  2 mds.0.cache Memory usage:  total 353748, rss 37372, heap
> 207156, baseline 182580, 0 / 3 inodes have caps, 0 caps, 0 caps per inode
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111789 from mon.2
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map i am now mds.0.1111787
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map state change up:reconnect -->
> up:rejoin
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000
> 7f6cdde80700  1 mds.0.1111787 rejoin_start
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.936+0000
> 7f6cdde80700  1 mds.0.1111787 rejoin_joint_start
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.940+0000
> 7f6ce0e86700  5 mds.beacon.production_cephfs.s11.srybkc received beacon
> reply up:rejoin seq 3 rtt 1.10697
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.946+0000
> 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
> inode 0x100013897dd
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000
> 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
> inode 0x1000139903b
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.948+0000
> 7f6cd7e74700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory
> inode 0x10001399315
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.953+0000
> 7f6cd9677700  1 mds.0.1111787 rejoin_done
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000
> 7f6cd9677700  3 mds.0.1111787 request_state up:active
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000
> 7f6cd9677700  5 mds.beacon.production_cephfs.s11.srybkc set_want_state:
> up:rejoin -> up:active
> Jan 21 19:52:21 s11 bash[3097096]: debug 2026-01-21T19:52:21.956+0000
> 7f6cd9677700  5 mds.beacon.production_cephfs.s11.srybkc Sending beacon
> up:active seq 4
> Jan 21 19:52:22 s11 bash[3097096]: debug 2026-01-21T19:52:22.513+0000
> 7f6cd9e78700  2 mds.0.cache Memory usage:  total 355796, rss 42628, heap
> 207156, baseline 182580, 0 / 445 inodes have caps, 0 caps, 0 caps per inode
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000
> 7f6cdde80700  1 mds.production_cephfs.s11.srybkc Updating MDS map to
> version 1111790 from mon.2
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map i am now mds.0.1111787
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000
> 7f6cdde80700  1 mds.0.1111787 handle_mds_map state change up:rejoin -->
> up:active
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.130+0000
> 7f6cdde80700  1 mds.0.1111787 recovery_done -- successful recovery!
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.132+0000
> 7f6ce1687700  0 --1- [v2:
> 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1:
> 20.2.0.65:0/3311648163 conn(0x564cc1275800 0x564cc0149000 :6801
> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
> accept peer reset, then tried to connect to us, replacing
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.132+0000
> 7f6ce0685700  0 --1- [v2:
> 20.2.0.104:6800/4062624433,v1:20.2.0.104:6801/4062624433] >> v1:
> 20.2.0.66:0/1664934024 conn(0x564cc139b000 0x564cc0149800 :6801
> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
> accept peer reset, then tried to connect to us, replacing
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  1 mds.0.1111787 active_start
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  1 mds.0.1111787 cluster recovered.
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  4 mds.0.1111787 set_osd_epoch_barrier: epoch=5831472
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on
> v1:20.2.0.65:0/3311648163
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset
> closing connection for session client.186227172 v1:20.2.0.65:0/3311648163
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_reset on v1:
> 20.2.0.65:0/3311648163
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on
> v1:20.2.0.66:0/1664934024
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset
> closing connection for session client.186208578 v1:20.2.0.66:0/1664934024
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.133+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_reset on v1:
> 20.2.0.66:0/1664934024
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000
> 7f6ce0e86700  5 mds.beacon.production_cephfs.s11.srybkc received beacon
> reply up:active seq 4 rtt 1.19397
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_remote_reset on
> 20.2.0.106:0/834757530
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.150+0000
> 7f6cdde80700  3 mds.production_cephfs.s11.srybkc ms_handle_remote_reset
> closing connection for session client.192669065 20.2.0.106:0/834757530
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000
> 7f6cdde80700  5 mds.production_cephfs.s11.srybkc ms_handle_reset on
> 20.2.0.106:0/834757530
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000
> 7f6cd6e72700  5 mds.0.log _submit_thread 4195668~8352 : EUpdate
> check_inode_max_size [metablob 0x1, 9 dirs]
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000
> 7f6cd6e72700  5 mds.0.log _submit_thread 4204040~1582 : EUpdate
> check_inode_max_size [metablob 0x10001399315, 1 dirs]
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.151+0000
> 7f6cd6e72700  5 mds.0.log _submit_thread 4205642~1554 : EUpdate
> check_inode_max_size [metablob 0x10001399315, 1 dirs]
> Jan 21 19:52:23 s11 bash[3097096]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc:
> In function 'bool Filer::_probed(Filer::Probe*, const object_t&, uint64_t,
> ceph::real_time, Filer::Probe::unique_lock&)' thread 7f6cd7e74700 time
> 2026-01-21T19:52:23.152720+0000
> Jan 21 19:52:23 s11 bash[3097096]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc:
> 224: FAILED ceph_assert(probe->known_size[p->oid] <= shouldbe)
> Jan 21 19:52:23 s11 bash[3097096]:  ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> Jan 21 19:52:23 s11 bash[3097096]:  1: (ceph::__ceph_assert_fail(char
> const*, char const*, int, char const*)+0x135) [0x7f6ce8482e15]
> Jan 21 19:52:23 s11 bash[3097096]:  2:
> /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb]
> Jan 21 19:52:23 s11 bash[3097096]:  3: (Filer::_probed(Filer::Probe*,
> object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock,
> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >,
> std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91]
> Jan 21 19:52:23 s11 bash[3097096]:  4: (Filer::C_Probe::finish(int)+0x91)
> [0x564cbe4c2c71]
> Jan 21 19:52:23 s11 bash[3097096]:  5: (Context::complete(int)+0xd)
> [0x564cbe09c5fd]
> Jan 21 19:52:23 s11 bash[3097096]:  6:
> (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd]
> Jan 21 19:52:23 s11 bash[3097096]:  7: /lib64/libpthread.so.0(+0x81ca)
> [0x7f6ce72281ca]
> Jan 21 19:52:23 s11 bash[3097096]:  8: clone()
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.152+0000
> 7f6cd7e74700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc:
> In function 'bool Filer::_probed(Filer::Probe*, const object_t&, uint64_t,
> ceph::real_time, Filer::Probe::unique_lock&)' thread 7f6cd7e74700 time
> 2026-01-21T19:52:23.152720+0000
> Jan 21 19:52:23 s11 bash[3097096]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osdc/Filer.cc:
> 224: FAILED ceph_assert(probe->known_size[p->oid] <= shouldbe)
> Jan 21 19:52:23 s11 bash[3097096]:  ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> Jan 21 19:52:23 s11 bash[3097096]:  1: (ceph::__ceph_assert_fail(char
> const*, char const*, int, char const*)+0x135) [0x7f6ce8482e15]
> Jan 21 19:52:23 s11 bash[3097096]:  2:
> /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb]
> Jan 21 19:52:23 s11 bash[3097096]:  3: (Filer::_probed(Filer::Probe*,
> object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock,
> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >,
> std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91]
> Jan 21 19:52:23 s11 bash[3097096]:  4: (Filer::C_Probe::finish(int)+0x91)
> [0x564cbe4c2c71]
> Jan 21 19:52:23 s11 bash[3097096]:  5: (Context::complete(int)+0xd)
> [0x564cbe09c5fd]
> Jan 21 19:52:23 s11 bash[3097096]:  6:
> (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd]
> Jan 21 19:52:23 s11 bash[3097096]:  7: /lib64/libpthread.so.0(+0x81ca)
> [0x7f6ce72281ca]
> Jan 21 19:52:23 s11 bash[3097096]:  8: clone()
> Jan 21 19:52:23 s11 bash[3097096]: *** Caught signal (Aborted) **
> Jan 21 19:52:23 s11 bash[3097096]:  in thread 7f6cd7e74700
> thread_name:MR_Finisher
> Jan 21 19:52:23 s11 bash[3097096]:  ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> Jan 21 19:52:23 s11 bash[3097096]:  1: /lib64/libpthread.so.0(+0x12d20)
> [0x7f6ce7232d20]
> Jan 21 19:52:23 s11 bash[3097096]:  2: gsignal()
> Jan 21 19:52:23 s11 bash[3097096]:  3: abort()
> Jan 21 19:52:23 s11 bash[3097096]:  4: (ceph::__ceph_assert_fail(char
> const*, char const*, int, char const*)+0x18f) [0x7f6ce8482e6f]
> Jan 21 19:52:23 s11 bash[3097096]:  5:
> /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb]
> Jan 21 19:52:23 s11 bash[3097096]:  6: (Filer::_probed(Filer::Probe*,
> object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock,
> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >,
> std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91]
> Jan 21 19:52:23 s11 bash[3097096]:  7: (Filer::C_Probe::finish(int)+0x91)
> [0x564cbe4c2c71]
> Jan 21 19:52:23 s11 bash[3097096]:  8: (Context::complete(int)+0xd)
> [0x564cbe09c5fd]
> Jan 21 19:52:23 s11 bash[3097096]:  9:
> (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd]
> Jan 21 19:52:23 s11 bash[3097096]:  10: /lib64/libpthread.so.0(+0x81ca)
> [0x7f6ce72281ca]
> Jan 21 19:52:23 s11 bash[3097096]:  11: clone()
> Jan 21 19:52:23 s11 bash[3097096]: debug 2026-01-21T19:52:23.154+0000
> 7f6cd7e74700 -1 *** Caught signal (Aborted) **
> Jan 21 19:52:23 s11 bash[3097096]:  in thread 7f6cd7e74700
> thread_name:MR_Finisher
> Jan 21 19:52:23 s11 bash[3097096]:  ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> Jan 21 19:52:23 s11 bash[3097096]:  1: /lib64/libpthread.so.0(+0x12d20)
> [0x7f6ce7232d20]
> Jan 21 19:52:23 s11 bash[3097096]:  2: gsignal()
> Jan 21 19:52:23 s11 bash[3097096]:  3: abort()
> Jan 21 19:52:23 s11 bash[3097096]:  4: (ceph::__ceph_assert_fail(char
> const*, char const*, int, char const*)+0x18f) [0x7f6ce8482e6f]
> Jan 21 19:52:23 s11 bash[3097096]:  5:
> /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f6ce8482fdb]
> Jan 21 19:52:23 s11 bash[3097096]:  6: (Filer::_probed(Filer::Probe*,
> object_t const&, unsigned long, std::chrono::time_point<ceph::real_clock,
> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >,
> std::unique_lock<std::mutex>&)+0x1781) [0x564cbe4c0f91]
> Jan 21 19:52:23 s11 bash[3097096]:  7: (Filer::C_Probe::finish(int)+0x91)
> [0x564cbe4c2c71]
> Jan 21 19:52:23 s11 bash[3097096]:  8: (Context::complete(int)+0xd)
> [0x564cbe09c5fd]
> Jan 21 19:52:23 s11 bash[3097096]:  9:
> (Finisher::finisher_thread_entry()+0x18d) [0x7f6ce8526abd]
> Jan 21 19:52:23 s11 bash[3097096]:  10: /lib64/libpthread.so.0(+0x81ca)
> [0x7f6ce72281ca]
> Jan 21 19:52:23 s11 bash[3097096]:  11: clone()
> Jan 21 19:52:23 s11 bash[3097096]:  NOTE: a copy of the executable, or
> `objdump -rdS <executable>` is needed to interpret this.
>
> Kind regards,
> Andrei
>
> чт, 29 янв. 2026 г. в 16:02, Dhairya Parmar <[email protected]>:
>
>> Hi, Can you share the crash backtrace and mds debug logs? Also is it
>> relevant to this issue https://tracker.ceph.com/issues/22550?
>>
>> *Dhairya Parmar*
>>
>> Software Engineer, CephFS
>>
>>
>> On Thu, Jan 29, 2026 at 1:05 PM Андрей Муханов via ceph-users <
>> [email protected]> wrote:
>>
>>> _______________________________________________
>>> ceph-users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: [MDS crash / ceph_assert failure] CephFS MDS crash on Reef 18.2.2 (Filer::_probed assertion)

Reply via email to