[ceph-users] Snaptriming speed degrade with pg increase

Szabo, Istvan (Agoda) Thu, 28 Nov 2024 18:31:36 -0800

Hi,

When we scale the placement group on a pool located in a full nvme cluster, the 
snaptriming speed degrades a lot.
Currently we are running with these values to not degrade client op and have 
some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu 
20.04)


-osd_max_trimming_pgs=2
--osd_snap_trim_sleep=0.1
--osd_pg_max_concurrent_snap_trims=2

We had a big pool which we used to have 128PG and that length of the 
snaptrimming took around 45-60 minutes.
Due to impossible to do maintenance on the cluster with 600GB pg sizes because 
it can easily max out a cluster (which we did), we increased to 1024 and the 
snaptrimming duration increased to 3.5 hours.

Is there any good solution that we are missing to fix this?

On the hardware level I've changed server profile to tune some numa settings 
but seems like didn't help still.

Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Snaptriming speed degrade with pg increase

Reply via email to