Hi Sinan, Agree on the safe approach to use upmap-remapped.py tool - it can help to reduce the unwanted data movement when new nodes are added. However since these are new nodes being added and not old ones removed/swapped - I suspect not much data movement going above the thresholds.
In case you reach those thresholds you can modify them with the following commands and add yourself some headroom: ceph osd set-nearfull-ratio .85 ceph osd set-backfillfull-ratio .90 ceph osd set-full-ratio .95 Increase the values by a small inch but make sure you tune them back down once rebalancing completes. This will allow the cluster to continue with backfilling and I/O requests. Set-nearfull-ration is purely cosmetic but should not be ignored. You can also disable the balancer and use this tool: https://github.com/laimis9133/plankton-swarm Just move a few pgs from the most full OSDs to less full ones manually. This will also give you some headroom for action. Best, Laimis J. > On 18 Mar 2025, at 16:59, Frédéric Nass <frederic.n...@univ-lorraine.fr> > wrote: > > > Hi Sinan, > > The safest approach would be to use the upmap-remapped.py tool developed by > Dan at CERN. See [1] for details. > > The idea is to leverage the upmap load balancer to progressively migrate the > data to the new servers, minimizing performance impact on the cluster and > clients. I like to create the OSDs ahead of time on the nodes that I > initially place in a root directory called ‘closet’. > > I then apply the norebalance flag (ceph osd set norebalance), disable the > balancer (ceph balancer off), move the new nodes with already provisioned > OSDs to their final location (rack), run ./upmap-remapped.py to bring all PGs > back to active+clean state, remove the norebalance flag (ceph osd unset > norebalance), re-enable the balancer (ceph balancer on) and watch data moving > progressively as the upmap balancer executes its plans. > > Regards, > Frédéric. > > [1] > https://www.google.com/url?q=https://docs.clyso.com/blog/adding-capacity-with-upmap-remapped/&source=gmail-imap&ust=1742914896000000&usg=AOvVaw2Pndpk0Yx78ZXUWsPAPNMe > > ----- Le 17 Mar 25, à 17:51, Sinan Polat sinan86po...@gmail.com a écrit : > >> Hello, >> >> I am currently managing a Ceph cluster that consists of 3 racks, each with >> 4 OSD nodes. Each node contains 24 OSDs. I plan to add three new nodes, one >> to each rack, to help alleviate the high OSD utilization. >> >> The current highest OSD utilization is 85%. I am concerned about the >> possibility of any OSD reaching the osd_full_ratio threshold during the >> rebalancing process. This would cause the cluster to enter a read-only >> state, which I want to avoid at all costs. >> >> I am planning to execute the following commands: >> >> ceph orch host add new-node-1 >> ceph orch host add new-node-2 >> ceph orch host add new-node-3 >> >> ceph osd crush move new-node-1 rack=rack-1 >> ceph osd crush move new-node-2 rack=rack-2 >> ceph osd crush move new-node-3 rack=rack-3 >> >> ceph config set osd osd_max_backfills 1 >> ceph config set osd osd_recovery_max_active 1 >> ceph config set osd osd_recovery_sleep 0.1 >> >> ceph orch apply osd --all-available-devices >> >> Before proceeding, I would like to ask if the above steps are safe to >> execute in a cluster with such high utilization. My main concern is whether >> the rebalancing could cause any OSD to exceed the osd_full_ratio and result >> in unexpected failures. >> >> Any insights or advice on how to safely add these nodes without impacting >> cluster stability would be greatly appreciated. >> >> Thanks! >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io