Thanks a lot ! I am retrying my test right now after having applied this change (filer_max_purge_ops 10 --> 40): let's see if things improve ...
Thanks again ! Cheers, Massimo On Wed, Nov 19, 2025 at 8:02 PM Dhairya Parmar <[email protected]> wrote: > Hi Massimo, > > I remember this has occurred a few times in the past and I wrote a doc on > purge-queue - https://docs.ceph.com/en/latest/cephfs/purge-queue, a > section of it speaks about a few recommendations to overcome cases like > yours. I'd recommend starting with setting `filer_max_purge_ops` to 40, > which should just work fine but you can also move ahead with setting other > configs recommended in the docs. > > > On Wed, Nov 19, 2025 at 3:57 PM Massimo Sgaravatto < > [email protected]> wrote: > >> Dear all >> >> In our cephfs installation I saw a couple of times a discrepancy between >> the >> "ceph df" and the "du -sh" outputs (see also this thread: >> https://lists.ceph.io\ >> /hyperkitty/list/ >> [email protected]/thread/BBED4ADXO3CE4FYLCNUWB4OML6N6CTZU/ >> ). >> >> >> I have the feeling that the problem is because of a lot of deletions done >> by some users (I know that some of them in their jobs use to write from >> time to time a checkpoint file and then delete the previous checkpoint >> file) >> >> >> >> Trying to reproduce the problem, I am running a script with 30 parallel >> threads >> where each thread: >> - writes a 40 GB file >> - sleeps 5 secs >> - deletes the file produced in the previous iteration >> >> After some hours I have been able to reproduce the issue. Right now >> "ceph df" shows a usage of ~ 11 TB (~ 33 TB considering the replica 3) >> while >> "du -sh" shows a usage of about 2.7 TB. >> >> >> We have 2 active (and 1 standby) MDS instances. >> >> >> In one MDS I see: >> >> [root@ceph-mds-01 ~]# ceph --admin-daemon >> /run/ceph/ceph-mds.ceph-mds-01.asok perf dump | jq '.["purge_queue"]' >> { >> "pq_executing_ops": 10240, >> "pq_executing_ops_high_water": 13202, >> "pq_executing": 1, >> "pq_executing_high_water": 16, >> "pq_executed_ops": 74781922, >> "pq_executed": 924254, >> "pq_item_in_journal": 26986 >> } >> >> while in the second one: >> >> [root@ceph-mds-02 ~]# ceph --admin-daemon >> /run/ceph/ceph-mds.ceph-mds-02.asok perf dump | jq '.["purge_queue"]' >> { >> "pq_executing_ops": 0, >> "pq_executing_ops_high_water": 0, >> "pq_executing": 0, >> "pq_executing_high_water": 0, >> "pq_executed_ops": 0, >> "pq_executed": 0, >> "pq_item_in_journal": 0 >> } >> >> I am using the default values for filer_max_purge_ops and "mds_max_purge*: >> >> [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok >> config show | grep filer_max_purge >> "filer_max_purge_ops": "10", >> [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok >> config show | grep mds_max_purge >> "mds_max_purge_files": "64", >> "mds_max_purge_ops": "8192", >> "mds_max_purge_ops_per_pg": "0.500000", >> >> mds_cache_memory_limit is set to 32 GiB and doing a: >> >> # ceph daemon mds.<mds> perf dump | grep mds_co_bytes >> >> I see for the 2 MDS instances, , respectively: >> >> 8237145057 >> 3479698 >> >> >> We are running ceph reef (we will soon update to squid) >> >> >> Should I try to increase filer_max_purge_ops ? >> >> >> Thanks a lot >> Cheers, Massimo >> _______________________________________________ >> ceph-users mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> >> _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
