Hi Soeren, understood. So stretched pools need also a stretched ceph cluster. So a simple setup would be with replication size 3 for replicated pools and 3 or more ceph monitors, ... The failure domain is the datacenter. That means if you lose one DC the ceph cluster is still online.
back to your original question: > If using a stretched pool across all 3 datacenters, what happens if one datacenter fails ? I did read the documentation and the question came up, because it do not understand the sentence "Individual Stretch Pools do not support I/O operations during a netsplit scenario between two or more zones" completely, does it mean there is no IO already if one datacenter fails ? If you have changed your failover domain to ‘Datacenter’, at least two DCs must be available to handle I/O operations. Do you want to use Ceph for the etcd database? Hope it helps, Joachim joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677 Am Mo., 28. Apr. 2025 um 12:15 Uhr schrieb Soeren Malchow < soeren.malc...@convotis.com>: > HI Joachim and Anthony, > > first: thanks for taking the time to answer. (now in plain text, sorry, i > did not think about that) > > the sentence i am referring to is for "stretched pools" without a > tiebreaker which is "stretch mode" if i understood the documentation > correctly. > I read this in the "Limitations" section exactly on the page your link > refers to as well. > > The reason behind having 3 datacenters is because we are having alot of > k8s clusters which also need to have quorum, if i distribute the etcd nodes > across 3 datacenters, the outage of one datacenter will keep the k8s > cluster operational. > Thats why i was explicitly referring to stretched pools, not stretch mode > (still hope i understand everything right). > > We do not have a single point of failure in the setup, all connections and > devices are redundant. > The latency between the datacenters is most likely very low (we can not > measure since i am in planning stages. > The connections between the datacenters are on dark fibres connected > through modules directly in the Top of the Rack switches, compared to the > local connectivity it will be almost the same. > We have an existing similar setup between 2 datacenters where the WAN > connection add below 1ms latency. > > On "exceptionally large nodes", those are all identical servers, 3 per > datacenter with 16 x 3.84 TB nvme disks, 128 AMD Epyc cores (on 2 sockets) > and 1.5 TB memory, i would not count them as "exceptionally large". > > I will read up a little more on asych replication. > > Cheers > Soeren > > > > ________________________________________ > From: Joachim Kraftmayer <joachim.kraftma...@clyso.com> > Sent: Monday, April 28, 2025 8:23 AM > To: Anthony D'Atri <anthony.da...@gmail.com> > Cc: Soeren Malchow <soeren.malc...@convotis.com>; ceph-users@ceph.io < > ceph-users@ceph.io> > Subject: Re: [ceph-users] Re: Stretched pool or not ? > > Hi Soeren. > First, I would like to clarify something. > There are two options: > stretched cluster > and > stretch mode. > Sometimes this cannot be relied upon. If you have a “stretched-cluster” > deployment in which much of your cluster is behind a single network > component, you might need to use stretch mode to ensure data integrity. > source: https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#id1 > The focus in this sentence is on ‘single network component’. I hope you > don't have a single point of failure in your setup. > > Which option is best for your requirements? > Regards, Joachim > > joachim.kraftma...@clyso.com > www.clyso.com > Hohenzollernstr. 27, 80801 Munich > Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677 > > > > Am Mo., 28. Apr. 2025 um 02:50 Uhr schrieb Anthony D'Atri < > anthony.da...@gmail.com>: > > Please make list posts in plain text. > > > > i am working on the plan for a 3 datacenter setup using ceph (in proxmox > nodes). > > > > Each datacenter has 3 physical nodes to start with and 100Gbit switches. > I will also have 2 x 100 Gbit/s connectivity between the datacenters (each > datacenter to each other). > > the physical nodes have 2 x 100Gbit/s for the public network and 2 x > 100Gbit/s for the cluster network. > > You almost certainly don’t need a cluster / replication network unless > these are exceptionally large nodes. > > > > > About this setup i have 2 questions. > > > > is it even necessary to evaluate a stretched cluster since the WAN > connections are as fast as the local ones (including the latency, since it > is only 25km) ? > > There’s more to latency than just distance. What is the measured > latency? A:B, B:C, C:A? > > > > > > > If using a stretched pool across all 3 datacenters, what happens if one > datacenter fails ? I did read the documentation and the question came up, > because it do not understand the sentence "Individual Stretch Pools do not > support I/O operations during a netsplit scenario between two or more > zones" completely, does it mean there is no IO already if one datacenter > fails ? > > That sentence refers to a non-stretch cluster. > > Tell us why you’re spreading across three DCs, what you’re trying to > accomplish, and what your performance requirements are. > > AIUI a stretch 3-site cluster requires all pools to be replicated, size=6. > > Explicit stretch mode treats the mon quorum in a different way. With two > OSD sites you deploy a tiebreaker at a third site, which is possibly just a > cloud VM. With three OSD sites, I might speculate that one would deploy 7 > mons, 2 At each OSD site + tiebreaker. > > Operations on a stretch cluster can be slow. Sometimes separate clusters > with asynchronous replication make more sense. > > > > > > > If i am on the wrong path, maybe someone has a link for me, where is can > find information on this setup ? > > > > Cheers > > Soeren > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io