[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-04 Thread Frédéric Nass
Hello Michel, What you need is: step choose indep 0 type datacenter step chooseleaf indep 2 type host step emit I think you're right about the need to tweak the crush rule by editing the crushmap directly. Regards Frédéric. - Le 3 Avr 23, à 18:34, Michel Jouvin mic

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-04 Thread Michel Jouvin
Hi Frank, Thanks for this additional information. Currently, I'd like to experiment with LRC that provides a "natural" way to implement the multistep OSD allocation to ensure the distribution across datacenters, without tweaking the crushmap rule. Configuration of LRC plugin is far from obvio

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-04 Thread Frank Schilder
Hi Michel, I don't have experience with LRC profiles. They may reduce cross-site traffic at the expense of extra overhead. But this might actually be unproblematic with EC profiles that have a large m any ways. If you do experiments with this, please let the list know. I would like to add here

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Anthony D'Atri
Mark Nelson's space amp sheet visualizes this really well. A nuance here is that Ceph always writes a full stripe, so with a 9,6 profile, on conventional media, a minimum of 15x4KB=20KB underlying storage will be consumed, even for a 1KB object. A 22 KB object would similarly tie up ~18KB of

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Michel Jouvin
Hi Frank, Thanks for this detailed answer. About your point of 4+2 or similar schemes defeating the purpose of a 3-datacenter configuration, you're right in principle. In our case, the goal is to avoid any impact for replicated pools (in particular RBD for the cloud) but it may be acceptable f

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Frank Schilder
Hi Michel, failure domain = datacenter doesn't work, because crush wants to put 1 shard per failure domain and you have 3 data centers and not 6. The modified crush rule you wrote should work. I believe equally well with x=0 or 2 -- but try it out before doing anything to your cluster. The eas