[ceph-users] Re: Network traffic with failure domain datacenter

Anthony D'Atri Thu, 08 May 2025 12:21:49 -0700


> On May 8, 2025, at 3:14 PM, Peter Linder <peter.lin...@fiberdirekt.se> wrote:
> 
> There is also the issue that if you have a 4+8 EC pool, you ideally need at 
> least 4+8 of whatever your failure domain is, in this case DCs.


Well, in terms of a formal stretch cluster, EC isn’t actually supported today, 
though if those DCs are VERY close to each other with small RTT one could fake 
it with something like 4+8, but without the mon quorum and min_size advantages 
of stretch mode.

With two DCS, formal stretch mode needs R4 pools.


> This is more than most people have.
> 
> Is this k=4, m=8? What is the benefit of this compared to an ordinary 
> replicated pool with 3 copies?
> 
> Even if you set the failure domain to, say rack, there is no guarantee that 
> there is no PG with more than 8 parts in a single DC without some crushmap 
> trickery.
> 
> If this is k=8, m=4, then only 4 failures can be handled and there is no way 
> to split 12 parts so that both DCs contain 4 or less at the same time.
> 
> You really need 3 DCs and a fast, highly available network in between.

With stretch typical 2 DCs and a tiebreaker mon elsewhere, which can have 
higher latency and even be a cloud VM.

> 
> /Peter
> 
> 
> 
> Den 2025-05-08 kl. 17:45, skrev Anthony D'Atri:
>> To be pedantic … backfill usually means copying data in toto, so like normal 
>> write replication it necessarily has to traverse the WAN.
>> 
>> Recovery of just a lost shard/replica in theory with the LRC plugin, but as 
>> noted that doesn’t seem like a good choice.  With the default EC plugin, 
>> there *may* be some read locality preference but it’s not something I would 
>> bank on.
>> 
>> Stretch clusters are great when you need zero RPO when you really need a 
>> single cluster and can manage client endpoint use accordingly.  But with 
>> tradeoffs, in many cases two clusters with async replication can be a better 
>> solution, depends on needs and what you’re solving for.
>> 
>>> On May 7, 2025, at 5:06 AM, Janne Johansson <icepic...@gmail.com> wrote:
>>> 
>>> Den ons 7 maj 2025 kl 10:59 skrev Torkil Svensgaard <tor...@drcmr.dk>:
>>>> We are looking at a cluster split between two DCs with the DCs as
>>>> failure domains.
>>>> 
>>>> Am I right in assuming that any recovery or backfill taking place should
>>>> largely happen inside each DC and not between them? Or can no such
>>>> assumptions be made?
>>>> Pools would be EC 4+8, if that matters.
>>> Unless I am mistaken, the first/primary of each PG is the one "doing"
>>> the backfills, so if the primaries are evenly distributed between the
>>> sites, the source of all backfills would be in the remote DC in 50% of
>>> the cases.
>>> I do not think the backfills are going to calculate how it can use
>>> only "local" pieces to rebuild a missing/degraded PG piece without
>>> going over the DC-DC link even if it is theoretically possible.
>>> 
>>> -- 
>>> May the most significant bit of your life be positive.
>> It’s good to be 8-bit-clean, if you aren’t , then Kermit can compensate.
>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Network traffic with failure domain datacenter

Reply via email to