Hi Massimo, I remember this has occurred a few times in the past and I wrote a doc on purge-queue - https://docs.ceph.com/en/latest/cephfs/purge-queue, a section of it speaks about a few recommendations to overcome cases like yours. I'd recommend starting with setting `filer_max_purge_ops` to 40, which should just work fine but you can also move ahead with setting other configs recommended in the docs.
On Wed, Nov 19, 2025 at 3:57 PM Massimo Sgaravatto < [email protected]> wrote: > Dear all > > In our cephfs installation I saw a couple of times a discrepancy between > the > "ceph df" and the "du -sh" outputs (see also this thread: > https://lists.ceph.io\ > /hyperkitty/list/ > [email protected]/thread/BBED4ADXO3CE4FYLCNUWB4OML6N6CTZU/ > ). > > > I have the feeling that the problem is because of a lot of deletions done > by some users (I know that some of them in their jobs use to write from > time to time a checkpoint file and then delete the previous checkpoint > file) > > > > Trying to reproduce the problem, I am running a script with 30 parallel > threads > where each thread: > - writes a 40 GB file > - sleeps 5 secs > - deletes the file produced in the previous iteration > > After some hours I have been able to reproduce the issue. Right now > "ceph df" shows a usage of ~ 11 TB (~ 33 TB considering the replica 3) > while > "du -sh" shows a usage of about 2.7 TB. > > > We have 2 active (and 1 standby) MDS instances. > > > In one MDS I see: > > [root@ceph-mds-01 ~]# ceph --admin-daemon > /run/ceph/ceph-mds.ceph-mds-01.asok perf dump | jq '.["purge_queue"]' > { > "pq_executing_ops": 10240, > "pq_executing_ops_high_water": 13202, > "pq_executing": 1, > "pq_executing_high_water": 16, > "pq_executed_ops": 74781922, > "pq_executed": 924254, > "pq_item_in_journal": 26986 > } > > while in the second one: > > [root@ceph-mds-02 ~]# ceph --admin-daemon > /run/ceph/ceph-mds.ceph-mds-02.asok perf dump | jq '.["purge_queue"]' > { > "pq_executing_ops": 0, > "pq_executing_ops_high_water": 0, > "pq_executing": 0, > "pq_executing_high_water": 0, > "pq_executed_ops": 0, > "pq_executed": 0, > "pq_item_in_journal": 0 > } > > I am using the default values for filer_max_purge_ops and "mds_max_purge*: > > [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok > config show | grep filer_max_purge > "filer_max_purge_ops": "10", > [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok > config show | grep mds_max_purge > "mds_max_purge_files": "64", > "mds_max_purge_ops": "8192", > "mds_max_purge_ops_per_pg": "0.500000", > > mds_cache_memory_limit is set to 32 GiB and doing a: > > # ceph daemon mds.<mds> perf dump | grep mds_co_bytes > > I see for the 2 MDS instances, , respectively: > > 8237145057 > 3479698 > > > We are running ceph reef (we will soon update to squid) > > > Should I try to increase filer_max_purge_ops ? > > > Thanks a lot > Cheers, Massimo > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
