HI Joachim and Anthony, first: thanks for taking the time to answer. (now in plain text, sorry, i did not think about that)
the sentence i am referring to is for "stretched pools" without a tiebreaker which is "stretch mode" if i understood the documentation correctly. I read this in the "Limitations" section exactly on the page your link refers to as well. The reason behind having 3 datacenters is because we are having alot of k8s clusters which also need to have quorum, if i distribute the etcd nodes across 3 datacenters, the outage of one datacenter will keep the k8s cluster operational. Thats why i was explicitly referring to stretched pools, not stretch mode (still hope i understand everything right). We do not have a single point of failure in the setup, all connections and devices are redundant. The latency between the datacenters is most likely very low (we can not measure since i am in planning stages. The connections between the datacenters are on dark fibres connected through modules directly in the Top of the Rack switches, compared to the local connectivity it will be almost the same. We have an existing similar setup between 2 datacenters where the WAN connection add below 1ms latency. On "exceptionally large nodes", those are all identical servers, 3 per datacenter with 16 x 3.84 TB nvme disks, 128 AMD Epyc cores (on 2 sockets) and 1.5 TB memory, i would not count them as "exceptionally large". I will read up a little more on asych replication. Cheers Soeren ________________________________________ From: Joachim Kraftmayer <joachim.kraftma...@clyso.com> Sent: Monday, April 28, 2025 8:23 AM To: Anthony D'Atri <anthony.da...@gmail.com> Cc: Soeren Malchow <soeren.malc...@convotis.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] Re: Stretched pool or not ? Hi Soeren. First, I would like to clarify something. There are two options: stretched cluster and stretch mode. Sometimes this cannot be relied upon. If you have a “stretched-cluster” deployment in which much of your cluster is behind a single network component, you might need to use stretch mode to ensure data integrity. source: https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#id1 The focus in this sentence is on ‘single network component’. I hope you don't have a single point of failure in your setup. Which option is best for your requirements? Regards, Joachim joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677 Am Mo., 28. Apr. 2025 um 02:50 Uhr schrieb Anthony D'Atri <anthony.da...@gmail.com>: Please make list posts in plain text. > i am working on the plan for a 3 datacenter setup using ceph (in proxmox > nodes). > > Each datacenter has 3 physical nodes to start with and 100Gbit switches. I > will also have 2 x 100 Gbit/s connectivity between the datacenters (each > datacenter to each other). > the physical nodes have 2 x 100Gbit/s for the public network and 2 x > 100Gbit/s for the cluster network. You almost certainly don’t need a cluster / replication network unless these are exceptionally large nodes. > > About this setup i have 2 questions. > > is it even necessary to evaluate a stretched cluster since the WAN > connections are as fast as the local ones (including the latency, since it is > only 25km) ? There’s more to latency than just distance. What is the measured latency? A:B, B:C, C:A? > > If using a stretched pool across all 3 datacenters, what happens if one > datacenter fails ? I did read the documentation and the question came up, > because it do not understand the sentence "Individual Stretch Pools do not > support I/O operations during a netsplit scenario between two or more zones" > completely, does it mean there is no IO already if one datacenter fails ? That sentence refers to a non-stretch cluster. Tell us why you’re spreading across three DCs, what you’re trying to accomplish, and what your performance requirements are. AIUI a stretch 3-site cluster requires all pools to be replicated, size=6. Explicit stretch mode treats the mon quorum in a different way. With two OSD sites you deploy a tiebreaker at a third site, which is possibly just a cloud VM. With three OSD sites, I might speculate that one would deploy 7 mons, 2 At each OSD site + tiebreaker. Operations on a stretch cluster can be slow. Sometimes separate clusters with asynchronous replication make more sense. > > > If i am on the wrong path, maybe someone has a link for me, where is can find > information on this setup ? > > Cheers > Soeren _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io