Thanks a lot !

I am retrying my test right now after having applied this change
(filer_max_purge_ops 10 --> 40): let's see if things improve ...

Thanks again !
Cheers, Massimo

On Wed, Nov 19, 2025 at 8:02 PM Dhairya Parmar <[email protected]> wrote:

> Hi Massimo,
>
> I remember this has occurred a few times in the past and I wrote a doc on
> purge-queue - https://docs.ceph.com/en/latest/cephfs/purge-queue, a
> section of it speaks about a few recommendations to overcome cases like
> yours. I'd recommend starting with setting `filer_max_purge_ops` to 40,
> which should just work fine but you can also move ahead with setting other
> configs recommended in the docs.
>
>
> On Wed, Nov 19, 2025 at 3:57 PM Massimo Sgaravatto <
> [email protected]> wrote:
>
>> Dear all
>>
>> In our cephfs installation I saw a couple of times a discrepancy between
>> the
>> "ceph df" and the "du -sh" outputs (see also this thread:
>> https://lists.ceph.io\
>> /hyperkitty/list/
>> [email protected]/thread/BBED4ADXO3CE4FYLCNUWB4OML6N6CTZU/
>> ).
>>
>>
>> I have the feeling that the problem is because of a lot of deletions done
>> by some users (I know that some of them in their jobs use to write from
>> time to time a checkpoint file and then delete the previous checkpoint
>> file)
>>
>>
>>
>> Trying to reproduce the problem, I am running a script with 30 parallel
>> threads
>> where each thread:
>> - writes a 40 GB file
>> - sleeps 5 secs
>> - deletes the file produced in the previous iteration
>>
>> After some hours I have been able to reproduce the issue. Right now
>> "ceph df" shows a usage of ~ 11 TB (~ 33 TB considering the replica 3)
>> while
>> "du -sh" shows a usage of about 2.7 TB.
>>
>>
>> We have 2 active (and 1 standby) MDS instances.
>>
>>
>> In one MDS I see:
>>
>> [root@ceph-mds-01 ~]# ceph --admin-daemon
>> /run/ceph/ceph-mds.ceph-mds-01.asok perf dump | jq '.["purge_queue"]'
>> {
>>   "pq_executing_ops": 10240,
>>   "pq_executing_ops_high_water": 13202,
>>   "pq_executing": 1,
>>   "pq_executing_high_water": 16,
>>   "pq_executed_ops": 74781922,
>>   "pq_executed": 924254,
>>   "pq_item_in_journal": 26986
>> }
>>
>> while in the second one:
>>
>> [root@ceph-mds-02 ~]# ceph --admin-daemon
>> /run/ceph/ceph-mds.ceph-mds-02.asok perf dump | jq '.["purge_queue"]'
>> {
>>   "pq_executing_ops": 0,
>>   "pq_executing_ops_high_water": 0,
>>   "pq_executing": 0,
>>   "pq_executing_high_water": 0,
>>   "pq_executed_ops": 0,
>>   "pq_executed": 0,
>>   "pq_item_in_journal": 0
>> }
>>
>> I am using the default values for filer_max_purge_ops and "mds_max_purge*:
>>
>> [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok
>> config show | grep filer_max_purge
>>     "filer_max_purge_ops": "10",
>> [root@ceph-mds-01 ~]# ceph daemon /run/ceph/ceph-mds.ceph-mds-01.asok
>> config show | grep mds_max_purge
>>     "mds_max_purge_files": "64",
>>     "mds_max_purge_ops": "8192",
>>     "mds_max_purge_ops_per_pg": "0.500000",
>>
>> mds_cache_memory_limit is set to 32 GiB and doing a:
>>
>> # ceph daemon mds.<mds> perf dump | grep mds_co_bytes
>>
>> I see for the 2 MDS instances, , respectively:
>>
>> 8237145057
>> 3479698
>>
>>
>> We are running ceph reef (we will soon update to squid)
>>
>>
>> Should I try to increase filer_max_purge_ops ?
>>
>>
>> Thanks a lot
>> Cheers, Massimo
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to