Hi Massimo, On Thu, Oct 30, 2025 at 2:10 PM Massimo Sgaravatto <[email protected]> wrote: > > Hello Venky > > Thanks for your help in debugging this issue > > I am using the default value for mds_cache_memory_limit (4 GiB). > Should I increase this value since the server hosting the MDS has much more > memory ?
Yes, that's generally recommended. The defaults are pretty low for any real production use. Note that the MDS can still use more than this configured cache limit (upto 150%), but it should start trimming its cache when it is nearing the cache memory limit. > Are there some guidelines on how to set this parameter wrt the physical > memory available in the server hosting the mds daemon ? I can't find this in: > > https://docs.ceph.com/en/reef/cephfs/cache-configuration/ Unfortunately, no. There have been requests to add some kind of a performance tuning guide which I think should be available in the near future. > > > Thanks, Massimo > > > > On Thu, Oct 30, 2025 at 9:07 AM Venky Shankar <[email protected]> wrote: >> >> Hi Massimo, >> >> On Wed, Oct 29, 2025 at 8:19 PM Massimo Sgaravatto >> <[email protected]> wrote: >> > >> > Hi Venky >> > Thanks for your answer >> > >> > No: we are not using snapshots >> >> Was the MDS cache memory running close to mds_cache_memory_limit? This >> is available in perf dump via >> >> $ ceph tell mds.<id> perf dump >> >> and look for mds_co_bytes. Or if you can recreate the issue and >> capture these details. >> >> And what's mds_cache_memory_limit set to BTW? >> >> > >> > Regards, Massimo >> > >> > On Wed, Oct 29, 2025 at 3:05 PM Venky Shankar <[email protected]> wrote: >> >> >> >> Hi Massimo, >> >> >> >> On Tue, Oct 28, 2025 at 5:30 PM Massimo Sgaravatto >> >> <[email protected]> wrote: >> >> > >> >> > Dear all >> >> > >> >> > We have a portion of a Cephfs file system that maps to a ceph pool >> >> > called >> >> > cephfs_data_ssd. >> >> > >> >> > >> >> > If I perform a "du -sh" on this portion of the file system, I see that >> >> > the >> >> > value matches the "STORED" field of the "ceph df" output for the >> >> > cephfs_data_ssd pool. >> >> > >> >> > So far so good. >> >> > >> >> > I set a quota of 4.5 TB for this file system area. >> >> > >> >> > >> >> > During the weekend, this pool (and other pools of the same device class) >> >> > became nearfull. >> >> > >> >> > A "ceph df" showed that the problem was indeed in the the >> >> > cephfs_data_ssd >> >> > pool, with a reported usage of 7 TiB of data (21 TiB in replica 3): >> >> > >> >> > cephfs_data_ssd 62 32 7.1 TiB 2.02M 21 TiB 89.06 898 GiB >> >> > >> >> > >> >> > This sounds strange to me because I set a quota of 4.5 TB in that area, >> >> > and >> >> > because a "du -sh" of the relevant directory showed a usage of 600 GB. >> >> > >> >> > >> >> > When I lowered the disk quota from 4.5 TB to 600 GB, the jobs writing in >> >> > that >> >> > area failed (because of disk quota exceeded) and after a while the >> >> > space was >> >> > released. >> >> > >> >> > >> >> > The only explanation I can think of is that, as far as I understand, >> >> > cephfs can take a while to release the space for deleted files >> >> > (https://docs.ceph.com/en/reef/dev/delayed-delete/). >> >> > >> >> > >> >> > This would also be consistent with the fact that it looks like some jobs >> >> > were performing a lot of writes and deletions (they kept writing a ~ 5GB >> >> > checkpoint file, and deleting the previous one after each iteration). >> >> >> >> That's likely what is causing the high pool usage -- the files are >> >> logically gone (du doesn't see them), but the objects are still lying >> >> in the data pool consuming space which aren't getting deleted by the >> >> purge queue in the MDS for some reason. Do you use snapshots? >> >> >> >> > >> >> > >> >> > >> >> > How can I understand from the log files if this was indeed the problem ? >> >> > >> >> > Or do you have some other possible explanations for this problem ? >> >> > >> >> > And, most important, how can I prevent scenarios such as this one ? >> >> > >> >> > Thanks, Massimo >> >> > _______________________________________________ >> >> > ceph-users mailing list -- [email protected] >> >> > To unsubscribe send an email to [email protected] >> >> > >> >> >> >> >> >> -- >> >> Cheers, >> >> Venky >> >> >> >> >> -- >> Cheers, >> Venky >> -- Cheers, Venky _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
