[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Kasper Rasmussen Sun, 19 Jan 2025 23:05:04 -0800

On Pacific -
It seems like when data is marked as degraded - no pgs are remapped, and the 
upmap-remapped.py is consistently returning - "There are no remapped PGs"
Also nobackfill and noreabalance has no affect in holding back any remapping 
(norecover has).


The recovering of the degraded files seems to be doing the remapping.

So, deploying a new crush-map in Pasific seems to be a big-bang thing with no 
control handles.


Balancing:
My cluster has a 55% RAW Used.
The balancer was disabled before I took over the cluster, unfortunately I do 
not have the full history of that - I believe it had something to do with it 
not working or being way to ineffective
Your advice to revert the weights to 1.0000.. is to give the balancer a 
starting point or?


My conclusion for now is, that since an upgrade to Quincy or Reef is already in 
the pipeline for the cluster, I'll go ahead and do that first before adding 
racks to my crushmap.



________________________________
From: Anthony D'Atri <anthony.da...@gmail.com>
Sent: Friday, January 17, 2025 16:06
To: Kasper Rasmussen <kasper_steenga...@hotmail.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Adding Rack to crushmap - Rebalancing multiple PB of 
data - advice/experience



On Jan 17, 2025, at 6:02 AM, Kasper Rasmussen <kasper_steenga...@hotmail.com> 
wrote:

However I'm concerned with the amount of data that needs to be rebalanced, 
since the cluster holds multiple PB, and I'm looking for review of/input for my 
plan, as well as words of advice/experience from someone who has been in a 
similar situation.

Yep, that’s why you want to use upmap-remapped.  Otherwise the thundering herd 
of data shuffling will DoS your client traffic, esp. since you’re using 
spinners.  Count on pretty much all data moving in the process, and the 
convergence taking …. maybe a week?

On Pacific: Data is marked as "degraded", and not misplaced as expected. I also 
see above 2000% degraded data (but that might be another issue)

On Quincy: Data is marked as misplaced - which seems correct.

I’m not specifically familiar with such a change, but that could be mainly 
cosmetic, a function of how the percentage is calculated for objects / PGs that 
are multiply remapped.

In the depths of time I had clusters that would sometimes show a negative 
number of RADOS objects to recover, it would bounce above and below zero a few 
times as it converged to 0.


Instead balancing has been done by a cron job executing - ceph osd 
reweight-by-utilization 112 0.05 30

I used a similar strategy with older releases.  Note that this will complicate 
your transition, as those relative weights are a function of the CRUSH 
topology, so when the topology changes, likely some reweighted OSDs will get 
much less than their fair share, and some will get much more.  How full is your 
cluster (ceph df)?  It might not be a bad idea to incrementally revert those 
all to 1.00000 if you have the capacity, and disable the cron job.
You’ll also likely want to switch to the balancer module for the upmap-remapped 
strategy to incrementally move your data around.  Did you have it disabled for 
a specific reason?

Updating to Reef before migrating might be to your advantage so that you can 
benefit from performance and efficiency improvements since Pacific.


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Reply via email to