Hi,

Quick update on this topic, seems to be the solution for us to offline compact 
all osds.
After that all snaptrimming can finish in an hour rather than a day.

________________________________
From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Sent: Friday, November 29, 2024 2:31:33 PM
To: Bandelow, Gunnar <gunnar.bande...@uni-greifswald.de>; Ceph Users 
<ceph-users@ceph.io>
Subject: [ceph-users] Re: Snaptriming speed degrade with pg increase

Let's say yes if that is the issue.



Istvan Szabo
Staff Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---------------------------------------------------




________________________________
From: Bandelow, Gunnar
Sent: Friday, November 29, 2024 1:47 PM
To: Szabo, Istvan (Agoda); Ceph Users
Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase

Dear Istvan,

The first thing that stands out:

Ubuntu 20.04  (EOL in April 2025)
and
Ceph v15 Octopus (EOL since 2022)

Is there a possibility to upgrade these things?

Best regards
Gunnar


--- Original Nachricht ---
Betreff: [ceph-users] Snaptriming speed degrade with pg increase
Von: "Szabo, Istvan (Agoda)" 
<istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>>
An: "Ceph Users" <ceph-users@ceph.io<mailto:ceph-users@ceph.io>>
Datum: 29-11-2024 3:30



Hi,

When we scale the placement group on a pool located in a full nvme cluster, the 
snaptriming speed degrades a lot.
Currently we are running with these values to not degrade client op and have 
some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu 
20.04)

-osd_max_trimming_pgs=2
--osd_snap_trim_sleep=0.1
--osd_pg_max_concurrent_snap_trims=2

We had a big pool which we used to have 128PG and that length of the 
snaptrimming took around 45-60 minutes.
Due to impossible to do maintenance on the cluster with 600GB pg sizes because 
it can easily max out a cluster (which we did), we increased to 1024 and the 
snaptrimming duration increased to 3.5 hours.

Is there any good solution that we are missing to fix this?

On the hardware level I've changed server profile to tune some numa settings 
but seems like didn't help still.

Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to