On May 9, 2025, at 5:13 PM, Torkil Svensgaard <tor...@drcmr.dk> wrote:
On 08-05-2025 21:14, Peter Linder wrote:
There is also the issue that if you have a 4+8 EC pool, you ideally need at
least 4+8 of whatever your failure domain is, in this case DCs. This is more
than most people have.
Is this k=4, m=8? What is the benefit of this compared to an ordinary
replicated pool with 3 copies?
My bad, I think I've misunderstood the definition of a failure domain, it would
actually be host.
We are going to have 2 DCs each with 7+ hosts, and a tiebreaker MON in a third
DC. The should allow us to lose one DC and an additional host and still be
online.
Even if you set the failure domain to, say rack, there is no guarantee that
there is no PG with more than 8 parts in a single DC without some crushmap
trickery.
We would use crush to ensure the placement we want, something like this:
rule EC_4_8 {
id ZYX
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class nvmebulk
step choose indep 0 type datacenter
step chooseleaf indep 6 type host
step emit
}
If this is k=8, m=4, then only 4 failures can be handled and there is no way to
split 12 parts so that both DCs contain 4 or less at the same time.
You really need 3 DCs and a fast, highly available network in between.
/Peter
Den 2025-05-08 kl. 17:45, skrev Anthony D'Atri:
To be pedantic … backfill usually means copying data in toto, so like normal
write replication it necessarily has to traverse the WAN.
Recovery of just a lost shard/replica in theory with the LRC plugin, but as
noted that doesn’t seem like a good choice. With the default EC plugin, there
*may* be some read locality preference but it’s not something I would bank on.
We looked at the LRC plugin and we don't think it would be worth the risk going
with that since it seems somewhat abandoned and not really used by anyone.
Stretch clusters are great when you need zero RPO when you really need a single
cluster and can manage client endpoint use accordingly. But with tradeoffs, in
many cases two clusters with async replication can be a better solution,
depends on needs and what you’re solving for.
We did consider two clusters + replication but then we would need more hardware
to get the same usable space, and money is scarce.
The WAN would probably be 2x10G and at a distance of less than 10km. The pools
would mainly be bulk storage so I think that should work ok.
Thanks all.
Mvh.
Torkil
On May 7, 2025, at 5:06 AM, Janne Johansson <icepic...@gmail.com> wrote:
Den ons 7 maj 2025 kl 10:59 skrev Torkil Svensgaard <tor...@drcmr.dk>:
We are looking at a cluster split between two DCs with the DCs as
failure domains.
Am I right in assuming that any recovery or backfill taking place should
largely happen inside each DC and not between them? Or can no such
assumptions be made?
Pools would be EC 4+8, if that matters.
Unless I am mistaken, the first/primary of each PG is the one "doing"
the backfills, so if the primaries are evenly distributed between the
sites, the source of all backfills would be in the remote DC in 50% of
the cases.
I do not think the backfills are going to calculate how it can use
only "local" pieces to rebuild a missing/degraded PG piece without
going over the DC-DC link even if it is theoretically possible.
--
May the most significant bit of your life be positive.
It’s good to be 8-bit-clean, if you aren’t , then Kermit can compensate.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: tor...@drcmr.dk
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io