HI, the distance between the datacenters does not exceed 25km ( 15 miles ), current 2 DC setup is with one different datacenter provider but the same Dark Fibre provider.
And yes, the clusters are hyperconverged, they are running proxmox. And the kubernetes nodes will run as VMs on top of Proxmox/KVM, our experience is, that 2 kubernetes control planes are not a good choice, you should have 3 or 5 to get a proper quorum. I read up on the quorum for the mon nodes in streched mode. In regards to unavailability vs. zero dataloss, we definitly prefer zero dataloss, but depending on the setup we are willing to make compromises and risk having to restore machines from backup. I will come up with a testplan (including simulating load) and try to test different scenarios and share the outcome Cheers Soeren ________________________________ From: Anthony D'Atri <anthony.da...@gmail.com> Sent: Monday, April 28, 2025 3:09 PM To: Joachim Kraftmayer <joachim.kraftma...@clyso.com> Cc: Soeren Malchow <soeren.malc...@convotis.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] Stretched pool or not ? understood. So stretched pools need also a stretched ceph cluster. The docs are a bit confusing, they refer to a stretched pool in a cluster that is not explicitly in stretch mode. We should probably not use “stretch” to describe anything that isn’t in a formal stretch mode cluster, as setting stretch mode affects behavior in certain ways. So a simple setup would be with replication size 3 for replicated pools and 3 or more ceph monitors, ... We want at least 2x mons per site + tiebreaker, so that not only can we form quorum, but that the cluster can operate if one crashes. The reason behind having 3 datacenters is because we are having alot of k8s clusters which also need to have quorum, if i distribute the etcd nodes across 3 datacenters, the outage of one datacenter will keep the k8s cluster operational. I think with K8s you could employ a strategy similar to Ceph’s stretch mode: * K8s workers and OSDs at *2* sites * 2x K8s control nodes + 1x Ceph mon at a tiebreaker site, which could even be just cloud VMs. That way the Ceph pools would only need R4 instead of R6. The latency between the datacenters is most likely very low (we can not measure since i am in planning stages. I know of one commercial Ceph support organization that dictates < 10ms RTT between OSD sites and < 100ms RTT to a tiebreaker mon. These thresholds might inform decisions and predictions. A quick web search asserts: >A rule of thumb is that RTT increases by approximately 1 millisecond (ms) for >every 60 miles of distance. The nuance to the formal stretch mode is the difference in how mon quorum is managed using reachability scores, and the automatic management of pools` min_size in order to maintain an operable cluster in the face of an entire DC going down. With a conventional cluster, if you have say 2 mons in one DC and 3 in another, loss of the second DC will result in an inoperable cluster unless one takes manual drastic action. The connections between the datacenters are on dark fibres connected through modules directly in the Top of the Rack switches, compared to the local connectivity it will be almost the same. We have an existing similar setup between 2 datacenters where the WAN connection add below 1ms latency. The same two DCs as would be in operation here? On "exceptionally large nodes", those are all identical servers, 3 per datacenter with 16 x 3.84 TB nvme disks, 128 AMD Epyc cores (on 2 sockets) and 1.5 TB memory, i would not count them as "exceptionally large". Gotcha. Is this a converged cluster? That’s a excess of cores and RAM just for Ceph if not. I will read up a little more on asych replication. RGW: multisite RBD: rbd-mirror CephFS: mirroring is fairly recent Part of the equation is having the clients be able to access the data, including if you’re solving for zero data *unavailability* vs zero *loss*. The latter is much easier than the former. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io