Hi Ryanne, > I frequently demo this stuff, where I pull the plug on entire DCs and apps keep running like nothing happened. Is there any public recording, documentation about these demos? I would be very useful to see how it works.
Thanks, Peter On Thu, 13 Feb 2020 at 00:42, Ryanne Dolan <ryannedo...@gmail.com> wrote: > > elaborate a bit more about the active-active > > Active/active in this context just means that both (or multiple) > clusters are used under normal operation, not just during an outage. > For this to work, you basically have isolated instances of your application > stack running in each DC, with MM2 keeping each DC in sync. If one DC is > unavailable, traffic is shifted to another DC. It's possible to set this up > s.t. failover/failback between DCs happens automatically and seamlessly, > e.g. with load balancers and health checks. It's more complicated to set up > than the active/standby approach, but DR sorta takes care of itself from > then on. I frequently demo this stuff, where I pull the plug on entire DCs > and apps keep running like nothing happened. > > On Wed, Feb 12, 2020 at 12:05 AM benitocm <benit...@gmail.com> wrote: > > > Hi Ryanne, > > > > Please could you elaborate a bit more about the active-active > > recommendation? > > > > Thanks in advance > > > > On Mon, Feb 10, 2020 at 10:21 PM benitocm <benit...@gmail.com> wrote: > > > > > Thanks very much for the response. > > > > > > Please could you elaborate a bit more about "I'd > > > arc in that direction. Instead of migrating A->B->C->D..., > active/active > > is > > > more like having one big cluster". > > > > > > Another thing that I would like to share is that currently my consumers > > > only consumer from one topic so the fact of introducing MM2 will impact > > > them. > > > Any suggestion in this regard would be greatly appreciated > > > > > > Thanks in advance again! > > > > > > > > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan <ryannedo...@gmail.com> > > > wrote: > > > > > >> Hello, sounds like you have this all figured out actually. A couple > > notes: > > >> > > >> > For now, we just need to handle DR requirements, i.e., we would not > > need > > >> active-active > > >> > > >> If your infrastructure is sufficiently advanced, active/active can be > a > > >> lot > > >> easier to manage than active/standby. If you are starting from scratch > > I'd > > >> arc in that direction. Instead of migrating A->B->C->D..., > active/active > > >> is > > >> more like having one big cluster. > > >> > > >> > secondary.primary.topic1 > > >> > > >> I'd recommend using regex subscriptions where possible, so that apps > > don't > > >> need to worry about these potentially complex topic names. > > >> > > >> > An additional question. If the topic is compacted, i.e.., the topic > > >> keeps > > >> > forever, does switchover operations would imply add an additional > path > > >> in > > >> > the topic name? > > >> > > >> I think that's right. You could always clean things up manually, but > > >> migrating between clusters a bunch of times would leave a trail of > > >> replication hops. > > >> > > >> Also, you might look into implementing a custom ReplicationPolicy. For > > >> example, you could squash "secondary.primary.topic1" into something > > >> shorter > > >> if you like. > > >> > > >> Ryanne > > >> > > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm <benit...@gmail.com> wrote: > > >> > > >> > Hi, > > >> > > > >> > After having a look to the talk > > >> > > > >> > > > >> > > > https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 > > >> > and the > > >> > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 > > >> > I am trying to understand how I would use it > > >> > in the setup that I have. For now, we just need to handle DR > > >> requirements, > > >> > i.e., we would not need active-active > > >> > > > >> > My requirements, more or less, are the following: > > >> > > > >> > 1) Currently, we have just one Kafka cluster "primary" where all the > > >> > producers are producing to and where all the consumers are consuming > > >> from. > > >> > 2) In case "primary" crashes, we would need to have other Kafka > > cluster > > >> > "secondary" where we will move all the producer and consumers and > keep > > >> > working. > > >> > 3) Once "primary" is recovered, we would need to move to it again > (as > > we > > >> > were in #1) > > >> > > > >> > To fullfill #2, I have thought to have a new Kafka cluster > "secondary" > > >> and > > >> > setup a replication procedure using MM2. However, it is not clear to > > me > > >> how > > >> > to proceed. > > >> > > > >> > I would describe the high level details so you guys can point my > > >> > misconceptions: > > >> > > > >> > A) Initial situation. As in the example of the KIP-382, in the > primary > > >> > cluster, we will have a local topic: "topic1" where the producers > will > > >> > produce to and the consumers will consume from. MM2 will create in > > the > > >> > primary the remote topic "primary.topic1" where the local topic in > the > > >> > primary will be replicated. In addition, the consumer group > > information > > >> of > > >> > primary will be also replicated. > > >> > > > >> > B) Kafka primary cluster is not available. Producers are moved to > > >> produce > > >> > into the topic1 that it was manually created. In addition, consumers > > >> need > > >> > to connect to > > >> > secondary to consume the local topic "topic1" where the producers > are > > >> now > > >> > producing and from the remote topic "primary.topic1" where the > > >> producers > > >> > were producing before, i.e., consumers will need to aggregate.This > is > > so > > >> > because some consumers could have lag so they will need to consume > > from > > >> > both. In this situation, local topic "topic1" in the secondary will > be > > >> > modified with new messages and will be consumed (its consumption > > >> > information will also change) but the remote topic "primary.topic1" > > will > > >> > not receive new messages but it will be consumed (its consumption > > >> > information will change) > > >> > > > >> > At this point, my conclusion is that consumers needs to consume from > > >> both > > >> > topics (the new messages produced in the local topic and the old > > >> messages > > >> > for consumers that had a lag) > > >> > > > >> > C) primary cluster is recovered (here is when the things get > > complicated > > >> > for me). In the talk, the new primary is renamed a primary-2 and the > > >> MM2 is > > >> > configured to active-active replication. > > >> > The result is the following. The secondary cluster will end up with > a > > >> new > > >> > remote topic (primary-2.topic1) that will contain a replica of the > new > > >> > topic1 created in the primary-2 cluster. The primary-2 cluster will > > >> have 3 > > >> > topics. "topic1" will be a new topic where in the near future > > producers > > >> > will produce, "secondary.topic1" contains the replica of the local > > topic > > >> > "topic1" in the secondary and "secondary.primary.topic1" that is > > >> "topic1" > > >> > of the old primary (got through the secondary). > > >> > > > >> > D) Once all the replicas are in sync, producers and consumers will > be > > >> moved > > >> > to the primary-2. Producers will produce to local topic "topic1" of > > >> > primary-2 cluster. The consumers > > >> > will connect to primary-2 to consume from "topic1" (new messages > that > > >> come > > >> > in), "secondary.topic1" (messages produced during the outage) and > from > > >> > "secondary.primary.topic1" (old messages) > > >> > > > >> > If topics have a retention time, e.g. 7 days, we could remove > > >> > "secondary.primary.topic1" after a few days, leaving the situation > as > > at > > >> > the beginning. However, if another problem happens in the middle, > the > > >> > number of topics could be a little difficult to handle. > > >> > > > >> > An additional question. If the topic is compacted, i.e.., the topic > > >> keeps > > >> > forever, does switchover operations would imply add an additional > path > > >> in > > >> > the topic name? > > >> > > > >> > I would appreciate some guidance with this. > > >> > > > >> > Regards > > >> > > > >> > > > > > >