[ceph-users] Re: Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

Frédéric Nass Wed, 06 Nov 2024 07:19:36 -0800

Hi Niklas,

To explain the 33% misplaced objects after you move a host to another DC, one 
would have to check the current crush rule (ceph osd getcrushmap | crushtool -d 
-) and to which OSDs PGs are mapped to before and after the move operation 
(ceph pg dump).


Regarding the replicated crush rule that Ceph creates by default, the rule puts 
a replica on every host under the 'default' root (failure domain being host).

# rules
rule replicated_rule {
        id 0
        type replicated
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

If you're using size 3 and min_size 2, and want to make sure your cluster 
continues to serve IOs with 2 datacenters being down then you need to make sure 
that these 2 datacenters only host 1 replica. 
You could group these 2 datacenters in 1 zone and all other datacenters in 
another zone and distribute 1 replica in zone 1 and the 2 other replicas in any 
of the datacenters of the other zone.

For example:

root default
    region FSN
        zone FSN1
            datacenter FSN1-DC1
                host machine-1
                    osd.0
                    ... 10 OSDs per datacenter
                ... currently 1 machine per datacenter
            datacenter FSN1-DC2
                host machine-2
                    ...
        zone FSN2        
            ... other 8 datacenters

Then create a new rule as per below:

# rules
rule replicated_zone {
        id 1
        type replicated
        step take FSN1
        step chooseleaf firstn 1 type datacenter
        step emit
        step take FSN2
        step chooseleaf firstn 2 type datacenter
        step emit
}

Then you'd just have to change the pool(s) crush rule and wait for data 
movement.

ceph osd pool set rbd_zone crush_rule replicated_zone

Note that you can use crushtool [1] to simulate PG mapping and check the new 
crush rule before applying it to any pools.

Regards,
Frédéric.

[1] https://docs.ceph.com/en/reef/man/8/crushtool/

----- Le 4 Nov 24, à 17:01, Niklas Hambüchen m...@nh2.me a écrit :

> Hi Joachim,
> 
> I'm currently looking for the general methodology and if it's possible without
> rebalancing everything.
> 
> But of course I'd also appreciate tips directly for my deployment; here is the
> info:
> 
> Ceph 18, Simple 3-replication (osd_pool_default_size = 3, default CRUSH rules
> Ceph creates for that).
> 
> Failure domains from `ceph osd tree`:
> 
> root default
>    region FSN
>        zone FSN1
>            datacenter FSN1-DC1
>                host machine-1
>                    osd.0
>                    ... 10 OSDs per datacenter
>                ... currently 1 machine per datacenter
>            datacenter FSN1-DC2
>                host machine-2
>                    ...
>            ... currently 8 datacenters
> 
> I already tried simply
> 
>    ceph osd crush move machine-1 datacenter=FSN1-DC2
> 
> to "simulate" that DC1 and DC2 are temporarily the same failure domain
> (machine-1 is the only machine in DC1 currently), but that immediately causes
> 33% of objects to be misplaced -- much more movement than I'd hope for and 
> more
> than would be needed (I'd expect 12.5% would need to be moved given that 1 out
> of 8 DCs needs to be moved).
> 
> Thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Setting temporary CRUSH "constraint" for planned cross-datacenter downtime

Reply via email to