Hi Sinan,

Agree on the safe approach to use upmap-remapped.py tool - it can help to 
reduce the unwanted data movement when new nodes are added.
However since these are new nodes being added and not old ones removed/swapped 
- I suspect not much data movement going above the thresholds.

In case you reach those thresholds you can modify them with the following 
commands and add yourself some headroom:
ceph osd set-nearfull-ratio .85
ceph osd set-backfillfull-ratio .90
ceph osd set-full-ratio .95

Increase the values by a small inch but make sure you tune them back down once 
rebalancing completes. This will allow the cluster to continue with backfilling 
and I/O requests.
Set-nearfull-ration is purely cosmetic but should not be ignored.

You can also disable the balancer and use this tool: 
https://github.com/laimis9133/plankton-swarm
Just move a few pgs from the most full OSDs to less full ones manually. This 
will also give you some headroom for action.


Best,
Laimis J.


> On 18 Mar 2025, at 16:59, Frédéric Nass <frederic.n...@univ-lorraine.fr> 
> wrote:
> 
> 
> Hi Sinan,
> 
> The safest approach would be to use the upmap-remapped.py tool developed by 
> Dan at CERN. See [1] for details.
> 
> The idea is to leverage the upmap load balancer to progressively migrate the 
> data to the new servers, minimizing performance impact on the cluster and 
> clients. I like to create the OSDs ahead of time on the nodes that I 
> initially place in a root directory called ‘closet’.
> 
> I then apply the norebalance flag (ceph osd set norebalance), disable the 
> balancer (ceph balancer off), move the new nodes with already provisioned 
> OSDs to their final location (rack), run ./upmap-remapped.py to bring all PGs 
> back to active+clean state, remove the norebalance flag (ceph osd unset 
> norebalance), re-enable the balancer (ceph balancer on) and watch data moving 
> progressively as the upmap balancer executes its plans.
> 
> Regards,
> Frédéric.
> 
> [1] 
> https://www.google.com/url?q=https://docs.clyso.com/blog/adding-capacity-with-upmap-remapped/&source=gmail-imap&ust=1742914896000000&usg=AOvVaw2Pndpk0Yx78ZXUWsPAPNMe
> 
> ----- Le 17 Mar 25, à 17:51, Sinan Polat sinan86po...@gmail.com a écrit :
> 
>> Hello,
>> 
>> I am currently managing a Ceph cluster that consists of 3 racks, each with
>> 4 OSD nodes. Each node contains 24 OSDs. I plan to add three new nodes, one
>> to each rack, to help alleviate the high OSD utilization.
>> 
>> The current highest OSD utilization is 85%. I am concerned about the
>> possibility of any OSD reaching the osd_full_ratio threshold during the
>> rebalancing process. This would cause the cluster to enter a read-only
>> state, which I want to avoid at all costs.
>> 
>> I am planning to execute the following commands:
>> 
>> ceph orch host add new-node-1
>> ceph orch host add new-node-2
>> ceph orch host add new-node-3
>> 
>> ceph osd crush move new-node-1 rack=rack-1
>> ceph osd crush move new-node-2 rack=rack-2
>> ceph osd crush move new-node-3 rack=rack-3
>> 
>> ceph config set osd osd_max_backfills 1
>> ceph config set osd osd_recovery_max_active 1
>> ceph config set osd osd_recovery_sleep 0.1
>> 
>> ceph orch apply osd --all-available-devices
>> 
>> Before proceeding, I would like to ask if the above steps are safe to
>> execute in a cluster with such high utilization. My main concern is whether
>> the rebalancing could cause any OSD to exceed the osd_full_ratio and result
>> in unexpected failures.
>> 
>> Any insights or advice on how to safely add these nodes without impacting
>> cluster stability would be greatly appreciated.
>> 
>> Thanks!
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to