[ceph-users] Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Anthony D'Atri Sun, 12 Jan 2025 12:49:17 -0800

[ ed: snag during moderation (somehow a newline was interpolated in the 
Subject), so I’m sending this on behalf of kasper_steenga...@hotmail.com 
<mailto:kasper_steenga...@hotmail.com> , to whom replies should be sent]



I'm managing a ceph cluster with +1K OSDs distributed accross 56 host.
Until now the crush rule used is the default replicated rule, but I want to 
change that in order to implement failure domain on rack level.

Current plan is to
- Disable rebalancing by executing - ceph osd set norebalance;
- Add Rack to crushmap and distribute the hosts accordingly (8 in each.) by 
using the built in commands
      - ceph osd crush add-bucket rack1 rack root=default
      - ceph osd crush move osd-host1 rack=rack1
- Create the new rack split rule with command
       -  ceph osd crush rule create-replicated rack_split default rack
- Set the rule across all my pools
      - for p in $(ceph osd lspools | cut -d' ' -f 2) ; do echo $p $(ceph osd 
pool set $p crush_rule rack_split) ; done
- Finally enable rebalancing - ceph osd unset norebalance;

However I'm concerned with the amount of data that needs to be rebalanced, since
 the cluster holds multiple PB, and I'm looking for review of/input for my plan,
 as well as words of advice/experience from someone who have been in similar 
situations.


——

[
 ed: You only include commands for one CRUSH `rack` — would you create multiple 
`rack` CRUSH buckets, at least three of them?

Are all of your pools replicated?  No EC pools for RGW buckets, CephFS data, 
etc?

What OSD media and networking does this cluster have? HDDs will be much slower 
and much more impacted during the process than SSDs. Is your client workload 
24x7? Which Ceph release? These factors inform how impactful the grand shuffle 
will be.  Are your mon DBs on SSDs?

A popular strategy is to use upmap-remapped.py to freeze all of the PG mappings 
before unsetting the norebalance flag, then the balancer will gradually undo 
the mappings as it moves data to where it now belongs.  This process has 
built-in throttling.

]
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Reply via email to