> On May 8, 2025, at 3:14 PM, Peter Linder <peter.lin...@fiberdirekt.se> wrote: > > There is also the issue that if you have a 4+8 EC pool, you ideally need at > least 4+8 of whatever your failure domain is, in this case DCs.
Well, in terms of a formal stretch cluster, EC isn’t actually supported today, though if those DCs are VERY close to each other with small RTT one could fake it with something like 4+8, but without the mon quorum and min_size advantages of stretch mode. With two DCS, formal stretch mode needs R4 pools. > This is more than most people have. > > Is this k=4, m=8? What is the benefit of this compared to an ordinary > replicated pool with 3 copies? > > Even if you set the failure domain to, say rack, there is no guarantee that > there is no PG with more than 8 parts in a single DC without some crushmap > trickery. > > If this is k=8, m=4, then only 4 failures can be handled and there is no way > to split 12 parts so that both DCs contain 4 or less at the same time. > > You really need 3 DCs and a fast, highly available network in between. With stretch typical 2 DCs and a tiebreaker mon elsewhere, which can have higher latency and even be a cloud VM. > > /Peter > > > > Den 2025-05-08 kl. 17:45, skrev Anthony D'Atri: >> To be pedantic … backfill usually means copying data in toto, so like normal >> write replication it necessarily has to traverse the WAN. >> >> Recovery of just a lost shard/replica in theory with the LRC plugin, but as >> noted that doesn’t seem like a good choice. With the default EC plugin, >> there *may* be some read locality preference but it’s not something I would >> bank on. >> >> Stretch clusters are great when you need zero RPO when you really need a >> single cluster and can manage client endpoint use accordingly. But with >> tradeoffs, in many cases two clusters with async replication can be a better >> solution, depends on needs and what you’re solving for. >> >>> On May 7, 2025, at 5:06 AM, Janne Johansson <icepic...@gmail.com> wrote: >>> >>> Den ons 7 maj 2025 kl 10:59 skrev Torkil Svensgaard <tor...@drcmr.dk>: >>>> We are looking at a cluster split between two DCs with the DCs as >>>> failure domains. >>>> >>>> Am I right in assuming that any recovery or backfill taking place should >>>> largely happen inside each DC and not between them? Or can no such >>>> assumptions be made? >>>> Pools would be EC 4+8, if that matters. >>> Unless I am mistaken, the first/primary of each PG is the one "doing" >>> the backfills, so if the primaries are evenly distributed between the >>> sites, the source of all backfills would be in the remote DC in 50% of >>> the cases. >>> I do not think the backfills are going to calculate how it can use >>> only "local" pieces to rebuild a missing/degraded PG piece without >>> going over the DC-DC link even if it is theoretically possible. >>> >>> -- >>> May the most significant bit of your life be positive. >> It’s good to be 8-bit-clean, if you aren’t , then Kermit can compensate. >> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io