[ceph-users] Re: Snaptriming speed degrade with pg increase

Frédéric Nass Sat, 30 Nov 2024 06:11:47 -0800

Hi Istvan, 

Yeah, it's a been a known bug. Thus my previous recommendation to first upgrade 
your cluster a more recent version of Ceph and then only reshard the mult-site 
synced bucket.


Cheers, 
Frédéric. 

----- Le 29 Nov 24, à 13:11, Istvan Szabo, Agoda <istvan.sz...@agoda.com> a 
écrit : 

> The reshard topic is running on quincy 17.2.7, but tested today the reshard,
> objects gone.

> Istvan

> From: Frédéric Nass <frederic.n...@univ-lorraine.fr>
> Sent: Friday, November 29, 2024 5:17:27 PM
> To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
> Cc: Ceph Users <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase
> Email received from the internet. If in doubt, don't click any link nor open 
> any
> attachment !

> ----- Le 29 Nov 24, à 11:11, Istvan Szabo, Agoda <istvan.sz...@agoda.com> a
> écrit :

>> Increased from 9 servers to 11 so let's say 20% capacity and performance 
>> added.

>> This is a different cluster purely rbd.

> I see, so big objects. You might want to increase osd_max_trimming_pgs and
> eventually osd_pg_max_concurrent_snap_trims and see how it goes.

>> (For the other topic can't be resharded because in multisite it will 
>> disappear
>> all the data disappear on remote site, need to create new bucket and migrate
>> data first to a higher sharded bucket).

> Hum... You have fallen significantly behind on Ceph versions, which must be
> hindering you in many operational tasks today. Another option would be to 
> catch
> up and reshard into a recent version in multi-site mode.

> Frédéric.

>> Istvan

>> From: Frédéric Nass <frederic.n...@univ-lorraine.fr>
>> Sent: Friday, November 29, 2024 4:58:52 PM
>> To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
>> Cc: Ceph Users <ceph-users@ceph.io>
>> Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase
>> Email received from the internet. If in doubt, don't click any link nor open 
>> any
>> attachment !
>> ________________________________

>> Hi Istvan,

>> Did the PG split involved using more OSDs than before? If so then increasing
>> these values (apart from the sleep) should not have a negative impact on
>> clients I/O compared to before the split and should accelerate the whole
>> process.

>> Did you reshard the buckets as discussed in the other thread?

>> Regards,
>> Frédéric.

>> ----- Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda istvan.sz...@agoda.com a 
>> écrit :

>> > Hi,

>> > When we scale the placement group on a pool located in a full nvme 
>> > cluster, the
>> > snaptriming speed degrades a lot.
>> > Currently we are running with these values to not degrade client op and 
>> > have
>> > some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on 
>> > ubuntu
>> > 20.04)

>> > -osd_max_trimming_pgs=2
>> > --osd_snap_trim_sleep=0.1
>> > --osd_pg_max_concurrent_snap_trims=2

>> > We had a big pool which we used to have 128PG and that length of the
>> > snaptrimming took around 45-60 minutes.
>> > Due to impossible to do maintenance on the cluster with 600GB pg sizes 
>> > because
>> > it can easily max out a cluster (which we did), we increased to 1024 and 
>> > the
>> > snaptrimming duration increased to 3.5 hours.

>> > Is there any good solution that we are missing to fix this?

>> > On the hardware level I've changed server profile to tune some numa 
>> > settings but
>> > seems like didn't help still.

>> > Thank you
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Snaptriming speed degrade with pg increase

Reply via email to