I am seeing right now the issue again, i.e. the output of "ceph df" is
seeing a much bigger usage wrt  the output of "du -sh "
I don't think it is a problem with cache memory:


[root@ceph-mds-01 ~]# ceph tell mds.ceph-mds-01 perf dump | grep -i mds_co
2025-11-12T13:45:08.131+0100 7f0819ffb640  0 client.1995938515
ms_handle_reset on v2:192.168.61.14:6800/2325284739
2025-11-12T13:45:08.205+0100 7f0819ffb640  0 client.1964256207
ms_handle_reset on v2:192.168.61.14:6800/2325284739
        "mds_co_bytes": 8237305458,
        "mds_co_items": 102318628,
[root@ceph-mds-01 ~]# ceph daemon mds.ceph-mds-01 config show | grep
mds_cache_memory_limit
    "mds_cache_memory_limit": "34359738368",
[root@ceph-mds-01 ~]#


Some hints to debug the issue would be really appreciated

Thanks, Massimo

On Thu, Oct 30, 2025 at 9:48 AM Venky Shankar <[email protected]> wrote:

> Hi Massimo,
>
> On Thu, Oct 30, 2025 at 2:10 PM Massimo Sgaravatto
> <[email protected]> wrote:
> >
> > Hello Venky
> >
> > Thanks for your help in debugging this issue
> >
> > I am using the default value for mds_cache_memory_limit (4 GiB).
> > Should I increase this value since the server hosting the MDS has much
> more memory ?
>
> Yes, that's generally recommended. The defaults are pretty low for any
> real production use. Note that the MDS can still use more than this
> configured cache limit (upto 150%), but it should start trimming its
> cache when it is nearing the cache memory limit.
>
> >  Are there some guidelines on how to set this parameter wrt the physical
> memory available in the server hosting the mds daemon ? I can't find this
> in:
> >
> > https://docs.ceph.com/en/reef/cephfs/cache-configuration/
>
> Unfortunately, no. There have been requests to add some kind of a
> performance tuning guide which I think should be available in the near
> future.
>
> >
> >
> > Thanks, Massimo
> >
> >
> >
> > On Thu, Oct 30, 2025 at 9:07 AM Venky Shankar <[email protected]>
> wrote:
> >>
> >> Hi Massimo,
> >>
> >> On Wed, Oct 29, 2025 at 8:19 PM Massimo Sgaravatto
> >> <[email protected]> wrote:
> >> >
> >> > Hi Venky
> >> > Thanks for your answer
> >> >
> >> > No: we are not using snapshots
> >>
> >> Was the MDS cache memory running close to mds_cache_memory_limit? This
> >> is available in perf dump via
> >>
> >>         $ ceph tell mds.<id> perf dump
> >>
> >> and look for mds_co_bytes. Or if you can recreate the issue and
> >> capture these details.
> >>
> >> And what's mds_cache_memory_limit set to BTW?
> >>
> >> >
> >> > Regards, Massimo
> >> >
> >> > On Wed, Oct 29, 2025 at 3:05 PM Venky Shankar <[email protected]>
> wrote:
> >> >>
> >> >> Hi Massimo,
> >> >>
> >> >> On Tue, Oct 28, 2025 at 5:30 PM Massimo Sgaravatto
> >> >> <[email protected]> wrote:
> >> >> >
> >> >> > Dear all
> >> >> >
> >> >> > We have a portion of a Cephfs file system that maps to a ceph pool
> called
> >> >> > cephfs_data_ssd.
> >> >> >
> >> >> >
> >> >> > If I perform a "du -sh" on this portion of the file system, I see
> that the
> >> >> > value matches the "STORED" field of the "ceph df" output for the
> >> >> > cephfs_data_ssd pool.
> >> >> >
> >> >> > So far so good.
> >> >> >
> >> >> > I set a quota of 4.5 TB for this file system area.
> >> >> >
> >> >> >
> >> >> > During the weekend, this pool (and other pools of the same device
> class)
> >> >> > became nearfull.
> >> >> >
> >> >> > A "ceph df" showed that the problem was indeed in the the
> cephfs_data_ssd
> >> >> > pool, with a reported usage of 7 TiB of data (21 TiB in replica 3):
> >> >> >
> >> >> > cephfs_data_ssd 62 32 7.1 TiB 2.02M 21 TiB 89.06 898 GiB
> >> >> >
> >> >> >
> >> >> > This sounds strange to me because I set a quota of 4.5 TB in that
> area, and
> >> >> > because a "du -sh" of the relevant directory showed a usage of 600
> GB.
> >> >> >
> >> >> >
> >> >> > When I lowered the disk quota from 4.5 TB to 600 GB, the jobs
> writing in
> >> >> > that
> >> >> > area failed (because of disk quota exceeded) and after a while the
> space was
> >> >> > released.
> >> >> >
> >> >> >
> >> >> > The only explanation I can think of is that, as far as I
> understand,
> >> >> > cephfs can take a while to release the space for deleted files
> >> >> > (https://docs.ceph.com/en/reef/dev/delayed-delete/).
> >> >> >
> >> >> >
> >> >> > This would also be consistent with the fact that it looks like
> some jobs
> >> >> > were performing a lot of writes and deletions (they kept writing a
> ~ 5GB
> >> >> > checkpoint file, and deleting the previous one after each
> iteration).
> >> >>
> >> >> That's likely what is causing the high pool usage -- the files are
> >> >> logically gone (du doesn't see them), but the objects are still lying
> >> >> in the data pool consuming space which aren't getting deleted by the
> >> >> purge queue in the MDS for some reason. Do you use snapshots?
> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > How can I understand from the log files if this was indeed the
> problem ?
> >> >> >
> >> >> > Or do you have some other possible explanations for this problem ?
> >> >> >
> >> >> > And, most important, how can I prevent scenarios such as this one ?
> >> >> >
> >> >> > Thanks, Massimo
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list -- [email protected]
> >> >> > To unsubscribe send an email to [email protected]
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Cheers,
> >> >> Venky
> >> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Venky
> >>
>
>
> --
> Cheers,
> Venky
>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to