Hi Soeren,

understood. So stretched pools need also a stretched ceph cluster.
So a simple setup would be with replication size 3 for replicated pools and
3 or more ceph monitors, ...
The failure domain is the datacenter. That means if you lose one DC the
ceph cluster is still online.

back to your original question:

> If using a stretched pool across all 3 datacenters, what happens if one
datacenter fails ? I did read the documentation and the question came up,
because it do not understand the sentence "Individual Stretch Pools do not
support I/O operations during a netsplit scenario between two or more
zones" completely, does it mean there is no IO already if one datacenter
fails ?

If you have changed your failover domain to ‘Datacenter’, at least two DCs
must be available to handle I/O operations.

Do you want to use Ceph for the etcd database?

Hope it helps, Joachim


  joachim.kraftma...@clyso.com

  www.clyso.com

  Hohenzollernstr. 27, 80801 Munich

Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677



Am Mo., 28. Apr. 2025 um 12:15 Uhr schrieb Soeren Malchow <
soeren.malc...@convotis.com>:

> HI Joachim and Anthony,
>
> first: thanks for taking the time to answer. (now in plain text, sorry, i
> did not think about that)
>
> the sentence i am referring to is for "stretched pools" without a
> tiebreaker which is "stretch mode" if i understood the documentation
> correctly.
> I read this in the "Limitations" section exactly on the page your link
> refers to as well.
>
> The reason behind having 3 datacenters is because we are having alot of
> k8s clusters which also need to have quorum, if i distribute the etcd nodes
> across 3 datacenters, the outage of one datacenter will keep the k8s
> cluster operational.
> Thats why i was explicitly referring to stretched pools, not stretch mode
> (still hope i understand everything right).
>
> We do not have a single point of failure in the setup, all connections and
> devices are redundant.
> The latency between the datacenters is most likely very low (we can not
> measure since i am in planning stages.
> The connections between the datacenters are on dark fibres connected
> through modules directly in the Top of the Rack switches, compared to the
> local connectivity it will be almost the same.
> We have an existing similar setup between 2 datacenters where the WAN
> connection add below 1ms latency.
>
> On "exceptionally large nodes", those are all identical servers, 3 per
> datacenter with 16 x 3.84 TB nvme disks, 128 AMD Epyc cores (on 2 sockets)
> and 1.5 TB memory, i would not count them as "exceptionally large".
>
> I will read up a little more on asych replication.
>
> Cheers
> Soeren
>
>
>
> ________________________________________
> From: Joachim Kraftmayer <joachim.kraftma...@clyso.com>
> Sent: Monday, April 28, 2025 8:23 AM
> To: Anthony D'Atri <anthony.da...@gmail.com>
> Cc: Soeren Malchow <soeren.malc...@convotis.com>; ceph-users@ceph.io <
> ceph-users@ceph.io>
> Subject: Re: [ceph-users] Re: Stretched pool or not ?
>
> Hi Soeren.
> First, I would like to clarify something.
> There are two options:
> stretched cluster
> and
> stretch mode.
> Sometimes this cannot be relied upon. If you have a “stretched-cluster”
> deployment in which much of your cluster is behind a single network
> component, you might need to use stretch mode to ensure data integrity.
> source: https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#id1
> The focus in this sentence is on ‘single network component’. I hope you
> don't have a single point of failure in your setup.
>
> Which option is best for your requirements?
> Regards, Joachim
>
>   joachim.kraftma...@clyso.com
>   www.clyso.com
>   Hohenzollernstr. 27, 80801 Munich
> Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677
>
>
>
> Am Mo., 28. Apr. 2025 um 02:50 Uhr schrieb Anthony D'Atri <
> anthony.da...@gmail.com>:
>
> Please make list posts in plain text.
>
>
> > i am working on the plan for a 3 datacenter setup using ceph (in proxmox
> nodes).
> >
> > Each datacenter has 3 physical nodes to start with and 100Gbit switches.
> I will also have 2 x 100 Gbit/s connectivity between the datacenters (each
> datacenter to each other).
> > the physical nodes have 2 x 100Gbit/s for the public network and 2 x
> 100Gbit/s for the cluster network.
>
> You almost certainly don’t need a cluster / replication network unless
> these are exceptionally large nodes.
>
> >
> > About this setup i have 2 questions.
> >
> > is it even necessary to evaluate a stretched cluster since the WAN
> connections are as fast as the local ones (including the latency, since it
> is only 25km) ?
>
> There’s more to latency than just distance.  What is the measured
> latency?  A:B, B:C, C:A?
>
>
>
> >
> > If using a stretched pool across all 3 datacenters, what happens if one
> datacenter fails ? I did read the documentation and the question came up,
> because it do not understand the sentence "Individual Stretch Pools do not
> support I/O operations during a netsplit scenario between two or more
> zones" completely, does it mean there is no IO already if one datacenter
> fails ?
>
> That sentence refers to a non-stretch cluster.
>
> Tell us why you’re spreading across three DCs, what you’re trying to
> accomplish, and what your performance requirements are.
>
> AIUI a stretch 3-site cluster requires all pools to be replicated, size=6.
>
> Explicit stretch mode treats the mon quorum in a different way.  With two
> OSD sites you deploy a tiebreaker at a third site, which is possibly just a
> cloud VM.   With three OSD sites, I might speculate that one would deploy 7
> mons, 2 At each OSD site +  tiebreaker.
>
> Operations on a stretch cluster can be slow.   Sometimes separate clusters
> with asynchronous replication make more sense.
>
> >
> >
> > If i am on the wrong path, maybe someone has a link for me, where is can
> find information on this setup ?
> >
> > Cheers
> > Soeren
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to