Hello Venky Thanks for your help in debugging this issue
I am using the default value for mds_cache_memory_limit (4 GiB). Should I increase this value since the server hosting the MDS has much more memory ? Are there some guidelines on how to set this parameter wrt the physical memory available in the server hosting the mds daemon ? I can't find this in: https://docs.ceph.com/en/reef/cephfs/cache-configuration/ Thanks, Massimo On Thu, Oct 30, 2025 at 9:07 AM Venky Shankar <[email protected]> wrote: > Hi Massimo, > > On Wed, Oct 29, 2025 at 8:19 PM Massimo Sgaravatto > <[email protected]> wrote: > > > > Hi Venky > > Thanks for your answer > > > > No: we are not using snapshots > > Was the MDS cache memory running close to mds_cache_memory_limit? This > is available in perf dump via > > $ ceph tell mds.<id> perf dump > > and look for mds_co_bytes. Or if you can recreate the issue and > capture these details. > > And what's mds_cache_memory_limit set to BTW? > > > > > Regards, Massimo > > > > On Wed, Oct 29, 2025 at 3:05 PM Venky Shankar <[email protected]> > wrote: > >> > >> Hi Massimo, > >> > >> On Tue, Oct 28, 2025 at 5:30 PM Massimo Sgaravatto > >> <[email protected]> wrote: > >> > > >> > Dear all > >> > > >> > We have a portion of a Cephfs file system that maps to a ceph pool > called > >> > cephfs_data_ssd. > >> > > >> > > >> > If I perform a "du -sh" on this portion of the file system, I see > that the > >> > value matches the "STORED" field of the "ceph df" output for the > >> > cephfs_data_ssd pool. > >> > > >> > So far so good. > >> > > >> > I set a quota of 4.5 TB for this file system area. > >> > > >> > > >> > During the weekend, this pool (and other pools of the same device > class) > >> > became nearfull. > >> > > >> > A "ceph df" showed that the problem was indeed in the the > cephfs_data_ssd > >> > pool, with a reported usage of 7 TiB of data (21 TiB in replica 3): > >> > > >> > cephfs_data_ssd 62 32 7.1 TiB 2.02M 21 TiB 89.06 898 GiB > >> > > >> > > >> > This sounds strange to me because I set a quota of 4.5 TB in that > area, and > >> > because a "du -sh" of the relevant directory showed a usage of 600 GB. > >> > > >> > > >> > When I lowered the disk quota from 4.5 TB to 600 GB, the jobs writing > in > >> > that > >> > area failed (because of disk quota exceeded) and after a while the > space was > >> > released. > >> > > >> > > >> > The only explanation I can think of is that, as far as I understand, > >> > cephfs can take a while to release the space for deleted files > >> > (https://docs.ceph.com/en/reef/dev/delayed-delete/). > >> > > >> > > >> > This would also be consistent with the fact that it looks like some > jobs > >> > were performing a lot of writes and deletions (they kept writing a ~ > 5GB > >> > checkpoint file, and deleting the previous one after each iteration). > >> > >> That's likely what is causing the high pool usage -- the files are > >> logically gone (du doesn't see them), but the objects are still lying > >> in the data pool consuming space which aren't getting deleted by the > >> purge queue in the MDS for some reason. Do you use snapshots? > >> > >> > > >> > > >> > > >> > How can I understand from the log files if this was indeed the > problem ? > >> > > >> > Or do you have some other possible explanations for this problem ? > >> > > >> > And, most important, how can I prevent scenarios such as this one ? > >> > > >> > Thanks, Massimo > >> > _______________________________________________ > >> > ceph-users mailing list -- [email protected] > >> > To unsubscribe send an email to [email protected] > >> > > >> > >> > >> -- > >> Cheers, > >> Venky > >> > > > -- > Cheers, > Venky > > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
