[ceph-users] Re: Somehow throotle recovery even further than basic options?

Anthony D'Atri Fri, 06 Sep 2024 19:36:29 -0700

> This sounds interesting because this way the pressure wouldn't be too big if 
> go like 0.1 0.2 OSD by OSD.


I used to do this as well, back before pg-upmap was a thing, and while I still 
had Jewel clients.  It is however less efficient, because some data ends up 
moving more than once.  Upweighting a handful of OSDs at the same time may 
spread the load and allow faster progress than going one at a time.  Say one 
per host or one per failure domain.

The PG remapping tools allow fine-grained control with more efficiency, though 
any clients that aren’t Luminous or later will have a really bad day.

> What I can see how ceph did it, when add the new OSDs, the complete host get 
> the remapped pgs from other hosts also, so the old osds PG number increased 
> by like +50% (which was already overloaded) and slowly rebalance to the newly 
> added osds on the same host. This initial pressure to big.

I don’t follow; adding new OSDs should on average decrease the PG replicas on 
the existing OSDs.  But imbalances during topology changes are one reason I 
like to raise mon_max_pg_per_osd to 1000, otherwise you can end up with PGs 
that won’t activate.

> 
> This "misplaced ratio to 1%" I've never tried, let me read a bit, thank you.
> 
> Istvan
> ________________________________
> From: Eugen Block <ebl...@nde.ag>
> Sent: Saturday, September 7, 2024 4:55:40 AM
> To: ceph-users@ceph.io <ceph-users@ceph.io>
> Subject: [ceph-users] Re: Somehow throotle recovery even further than basic 
> options?
> 
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> ________________________________
> 
> I can’t say anything about the pgremapper, but have you tried
> increasing the crush weight gradually? Add new OSDs with crush initial
> weight 0 and then increase it in small steps. I haven’t used that
> approach for years, but maybe that can help here. Or are all OSDs
> already up and in? Or you could reduce the max misplaced ratio to 1%
> or even lower (default is 5%)?
> 
> Zitat von "Szabo, Istvan (Agoda)" <istvan.sz...@agoda.com>:
> 
>> Forgot to paste, somehow I want to reduce this recovery operation:
>> recovery: 0 B/s, 941.90k keys/s, 188 objects/s
>> To 2-300Keys/sec
>> 
>> 
>> 
>> ________________________________
>> From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
>> Sent: Friday, September 6, 2024 11:18 PM
>> To: Ceph Users <ceph-users@ceph.io>
>> Subject: [ceph-users] Somehow throotle recovery even further than
>> basic options?
>> 
>> Hi,
>> 
>> 4 years ago we've created our cluster with all disks 4osds (ssds and
>> nvme disks) on octopus.
>> The 15TB SSDs still working properly with 4 osds but the small 1.8T
>> nvmes with the index pool not.
>> Each new nvme osd adding to the existing nodes generates slow ops
>> with scrub off, recovery_op_priority 1, backfill and recovery 1-1.
>> I even turned off all index pool heavy sync mechanism but the read
>> latency still high which means recovery op pushes it even higher.
>> 
>> I'm trying to somehow add resource to the cluster to spread the 2048
>> index pool pg (in replica 3 means 6144pg index pool) but can't make
>> it more gentle.
>> 
>> The balancer is working in upmap with max deviation 1.
>> 
>> Have this script from digitalocean
>> https://github.com/digitalocean/pgremapper, is there anybody tried
>> it before how is it or could this help actually?
>> 
>> Thank you the ideas.
>> 
>> ________________________________
>> This message is confidential and is for the sole use of the intended
>> recipient(s). It may also be privileged or otherwise protected by
>> copyright or other legal rules. If you have received it by mistake
>> please let us know by reply email and delete it from your system. It
>> is prohibited to copy this message or disclose its content to
>> anyone. Any confidentiality or privilege is not waived or lost by
>> any mistaken delivery or unauthorized disclosure of the message. All
>> messages sent to and from Agoda may be monitored to ensure
>> compliance with company policies, to protect the company's interests
>> and to remove potential malware. Electronic messages may be
>> intercepted, amended, lost or deleted, or contain viruses.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ________________________________
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Somehow throotle recovery even further than basic options?

Reply via email to