Hi Kasper, I don't know if it will chew up the disproportionate amount of memory now, but it will do so next time. So, I recommend connecting a large-enough SSD now and adding swap when you see the excessive memory consumption.
Regarding the ticket, yes, it exists: I found https://tracker.ceph.com/issues/71167 and https://tracker.ceph.com/issues/71136 and https://tracker.ceph.com/issues/52715 but I am sure more such tickets exist, and there is no single root cause. On Wed, Jun 18, 2025 at 3:41 PM Kasper Rasmussen < kasper_steenga...@hotmail.com> wrote: > Hi Alexander > > Thanks man > > Forgot to mention ceph version is 18.2.7 > > Is this described anywhere - bug tracker / docs? > > Also when you write: "The same huge-swap recommendation applies to the > recovery operations." > > Should I - If I fail over the MDS in current state - expect that it will > chew away on huge amounts of RAM, requiring me to add 1TB swap? > > BR. Kasper > > > ------------------------------ > *From:* Alexander Patrakov <patra...@gmail.com> > *Sent:* Wednesday, June 18, 2025 09:11 > *To:* Kasper Rasmussen <kasper_steenga...@hotmail.com> > *Cc:* ceph-users <ceph-users@ceph.io> > *Subject:* Re: [ceph-users] CephFS scrub resulting in MDS_CACHE_OVERSIZED > > Hello Kasper, > > This is known. Next time, please add at least 1 TB of swap before the > scrub, and ignore the warning while the MDS is chewing through all the > directories and files. > > The same huge-swap recommendation applies to the recovery operations. > > On Wed, Jun 18, 2025 at 3:01 PM Kasper Rasmussen < > kasper_steenga...@hotmail.com> wrote: > > After starting a recursive scrub on a cephfs with alot of files the MDS > cache went oversized. > > Scrub command: ceph... scrub start / recursive,repair,force > > I kept an eye on the MDS memory usage - since I was warned that it might > go crazy.. and after 2-3 hours, I started getting the warning > > [WRN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache > mds.generic-mds.<host>.asddje(mds.0): MDS cache is too large > (63GB/36GB); 1250394 inodes in use by clients, 28888 stray files > > > I then paused the scrub, resulting in scrub status > > > { > > "status": "PAUSED (22837086 inodes in the stack)", > > "scrubs": { > > "27f0e32a-bc8c-443d-b1f0-534474798ddf": { > > "path": "/", > > "tag": "27f0e32a-bc8c-443d-b1f0-534474798ddf", > > "options": "recursive,repair,force" > > } > > } > > } > > and expected the cache size to go down again - but it didn't. > After +12 hours with no change, I opted to abort the scrub - again > expecting that the inodes in the stack would be offloaded from memory. > > The status after abort command: > > { > > "status": "PAUSED (0 inodes in the stack)", > > "scrubs": {} > > } > > But still no changes to the cache size. > > Since the status after the abort command had "PAUSED" in it, I resumed the > scrub, resulting in status: > > { > > "status": "no active scrubs running", > > "scrubs": {} > > } > > Still no changes to the cache size. > > The log from the MDS in standard log level was: > > debug 2025-06-03T06:48:24.122+0000 7f319065d640 1 > mds.generic-mds.<host>.asddje asok_command: scrub start > {path=/,prefix=scrub start,scrubops=[recursive,repair,force]} (starting...) > debug 2025-06-03T06:48:24.122+0000 7f318864d640 0 log_channel(cluster) log > [INF] : scrub queued for path: / > debug 2025-06-03T06:48:24.122+0000 7f318864d640 0 log_channel(cluster) log > [INF] : scrub summary: idle+waiting paths [/] > debug 2025-06-03T06:48:24.122+0000 7f318864d640 0 log_channel(cluster) log > [INF] : scrub summary: active paths [/] > debug 2025-06-03T06:48:24.126+0000 7f3189e50640 1 > mds.0.cache.dir(0x10041e16a55) mismatch between head items and > fnode.fragstat! printing dentries > debug 2025-06-03T06:48:24.126+0000 7f3189e50640 1 > mds.0.cache.dir(0x10041e16a55) get_num_head_items() = 38; > fnode.fragstat.nfiles=28 fnode.fragstat.nsubdirs=11 > debug 2025-06-03T06:48:24.126+0000 7f3189e50640 1 > mds.0.cache.dir(0x10041e16a55) mismatch between child accounted_rstats and > my rstats! > debug 2025-06-03T06:48:24.126+0000 7f3189e50640 1 > mds.0.cache.dir(0x10041e16a55) total of child dentries: n(v0 > rc2025-06-03T06:48:11.042059+0000 b1661845634 127=95+32) > debug 2025-06-03T06:48:24.126+0000 7f3189e50640 1 > mds.0.cache.dir(0x10041e16a55) my rstats: n(v544237 > rc2025-06-03T06:48:11.042059+0000 b1661845650 128=96+32) > debug 2025-06-03T06:49:38.689+0000 7f319065d640 1 > mds.generic-mds.<host>.asddje asok_command: scrub status {prefix=scrub > status} (starting...) > debug 2025-06-03T06:51:49.782+0000 7f319065d640 1 > mds.generic-mds.<host>.asddje asok_command: scrub status {prefix=scrub > status} (starting...) > debug 2025-06-03T06:55:39.654+0000 7f319065d640 1 > mds.generic-mds.<host>.asddje asok_command: scrub status {prefix=scrub > status} (starting...) > debug 2025-06-03T07:00:56.205+0000 7f319065d640 1 > mds.generic-mds.<host>.asddje asok_command: scrub status > .. > .. > From here it's either > - asok_command: scrub status {prefix=scrub status} (starting...) > - Updating MDS map to version xxxxxx from mon.3 > Until I pause the scrub. > > Extracts from the perf dump from the MDS: > > "mds": { > .. > .. > .. > "inodes": 23121955, > "inodes_top": 3684, > "inodes_bottom": 1728, > "inodes_pin_tail": 23116543, > "inodes_pinned": 23116691, > "inodes_expired": 39049803601, > "inodes_with_caps": 84593, > .. > .. > > } > .. > .. > "mds_mem": { > "ino": 23114378, > "ino+": 38966647328, > "ino-": 38943532950, > "dir": 513065, > "dir+": 130921896, > "dir-": 130408831, > "dn": 23121954, > "dn+": 39349549680, > "dn-": 39326427726, > "cap": 87280, > "cap+": 6964477825, > "cap-": 6964390545, > "rss": 79730620, > "heap": 223508 > }, > > > I've have been reluctant to just fail the MDS, to clear the memory, but > when I finally came around to do so I got the error > > "Error EPERM: MDS has one of two health warnings which could extend > recovery: MDS_TRIM or MDS_CACHE_OVERSIZED. MDS failover is not recommended > since it might cause unexpected file system unavailability. If you wish to > proceed, pass --yes-i-really-mean-it" > > At this moment the number strays reported in the MDS_CACHE_OVERSIZED > warning, are now up with a factor 10 (approx. 280000) > > Which made me pause. > This seems like a bug.. But To be honest I don't know quite what to > expect, if I just execute with "--yes-i-really-mean-it".. > Will the MDS eat huge amount of RAM during replay? (I've seen this before > during failover - where MDS ate almost 200GB ram, even though the cache was > not oversized.) > Any advice on how to proceed? > > BR. Kasper > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > Alexander Patrakov > -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io