[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Joshua Baergen Mon, 13 Jan 2025 07:09:47 -0800

Note that 'norebalance' disables the balancer but doesn't prevent
backfill; you'll want to set 'nobackfill' as well.


Josh

On Sun, Jan 12, 2025 at 1:49 PM Anthony D'Atri <anthony.da...@gmail.com> wrote:
>
> [ ed: snag during moderation (somehow a newline was interpolated in the 
> Subject), so I’m sending this on behalf of kasper_steenga...@hotmail.com 
> <mailto:kasper_steenga...@hotmail.com> , to whom replies should be sent]
>
>
> I'm managing a ceph cluster with +1K OSDs distributed accross 56 host.
> Until now the crush rule used is the default replicated rule, but I want to 
> change that in order to implement failure domain on rack level.
>
> Current plan is to
> - Disable rebalancing by executing - ceph osd set norebalance;
> - Add Rack to crushmap and distribute the hosts accordingly (8 in each.) by 
> using the built in commands
>       - ceph osd crush add-bucket rack1 rack root=default
>       - ceph osd crush move osd-host1 rack=rack1
> - Create the new rack split rule with command
>        -  ceph osd crush rule create-replicated rack_split default rack
> - Set the rule across all my pools
>       - for p in $(ceph osd lspools | cut -d' ' -f 2) ; do echo $p $(ceph osd 
> pool set $p crush_rule rack_split) ; done
> - Finally enable rebalancing - ceph osd unset norebalance;
>
> However I'm concerned with the amount of data that needs to be rebalanced, 
> since
>  the cluster holds multiple PB, and I'm looking for review of/input for my 
> plan,
>  as well as words of advice/experience from someone who have been in similar 
> situations.
>
>
> ——
>
> [
>  ed: You only include commands for one CRUSH `rack` — would you create 
> multiple `rack` CRUSH buckets, at least three of them?
>
> Are all of your pools replicated?  No EC pools for RGW buckets, CephFS data, 
> etc?
>
> What OSD media and networking does this cluster have? HDDs will be much 
> slower and much more impacted during the process than SSDs. Is your client 
> workload 24x7? Which Ceph release? These factors inform how impactful the 
> grand shuffle will be.  Are your mon DBs on SSDs?
>
> A popular strategy is to use upmap-remapped.py to freeze all of the PG 
> mappings before unsetting the norebalance flag, then the balancer will 
> gradually undo the mappings as it moves data to where it now belongs.  This 
> process has built-in throttling.
>
> ]
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

Reply via email to