> elaborate a bit more about the active-active

Active/active in this context just means that both (or multiple)
clusters are used under normal operation, not just during an outage.
For this to work, you basically have isolated instances of your application
stack running in each DC, with MM2 keeping each DC in sync. If one DC is
unavailable, traffic is shifted to another DC. It's possible to set this up
s.t. failover/failback between DCs happens automatically and seamlessly,
e.g. with load balancers and health checks. It's more complicated to set up
than the active/standby approach, but DR sorta takes care of itself from
then on. I frequently demo this stuff, where I pull the plug on entire DCs
and apps keep running like nothing happened.

On Wed, Feb 12, 2020 at 12:05 AM benitocm <benit...@gmail.com> wrote:

> Hi Ryanne,
>
> Please could you elaborate a bit more about the active-active
> recommendation?
>
> Thanks in advance
>
> On Mon, Feb 10, 2020 at 10:21 PM benitocm <benit...@gmail.com> wrote:
>
> > Thanks very much for the response.
> >
> > Please could you elaborate a bit more about  "I'd
> > arc in that direction. Instead of migrating A->B->C->D..., active/active
> is
> > more like having one big cluster".
> >
> > Another thing that I would like to share is that currently my consumers
> > only consumer from one topic so the fact of introducing MM2 will impact
> > them.
> > Any suggestion in this regard would be greatly appreciated
> >
> > Thanks in advance again!
> >
> >
> > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan <ryannedo...@gmail.com>
> > wrote:
> >
> >> Hello, sounds like you have this all figured out actually. A couple
> notes:
> >>
> >> > For now, we just need to handle DR requirements, i.e., we would not
> need
> >> active-active
> >>
> >> If your infrastructure is sufficiently advanced, active/active can be a
> >> lot
> >> easier to manage than active/standby. If you are starting from scratch
> I'd
> >> arc in that direction. Instead of migrating A->B->C->D..., active/active
> >> is
> >> more like having one big cluster.
> >>
> >> > secondary.primary.topic1
> >>
> >> I'd recommend using regex subscriptions where possible, so that apps
> don't
> >> need to worry about these potentially complex topic names.
> >>
> >> > An additional question. If the topic is compacted, i.e.., the topic
> >> keeps
> >> > forever, does switchover operations would imply add an additional path
> >> in
> >> > the topic name?
> >>
> >> I think that's right. You could always clean things up manually, but
> >> migrating between clusters a bunch of times would leave a trail of
> >> replication hops.
> >>
> >> Also, you might look into implementing a custom ReplicationPolicy. For
> >> example, you could squash "secondary.primary.topic1" into something
> >> shorter
> >> if you like.
> >>
> >> Ryanne
> >>
> >> On Mon, Feb 10, 2020 at 1:24 PM benitocm <benit...@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > After having a look to the talk
> >> >
> >> >
> >>
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> >> > and the
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> >> > I am trying to understand how I would use it
> >> > in the setup that I have. For now, we just need to handle DR
> >> requirements,
> >> > i.e., we would not need active-active
> >> >
> >> > My requirements, more or less, are the following:
> >> >
> >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> >> > producers are producing to and where all the consumers are consuming
> >> from.
> >> > 2) In case "primary" crashes, we would need to have other Kafka
> cluster
> >> > "secondary" where we will move all the producer and consumers and keep
> >> > working.
> >> > 3) Once "primary" is recovered, we would need to move to it again (as
> we
> >> > were in #1)
> >> >
> >> > To fullfill #2, I have thought to have a new Kafka cluster "secondary"
> >> and
> >> > setup a replication procedure using MM2. However, it is not clear to
> me
> >> how
> >> > to proceed.
> >> >
> >> > I would describe the high level details so you guys can point my
> >> > misconceptions:
> >> >
> >> > A) Initial situation. As in the example of the KIP-382, in the primary
> >> > cluster, we will have a local topic: "topic1" where the producers will
> >> > produce to and the consumers will consume from. MM2 will create in
> the
> >> > primary the remote topic "primary.topic1" where the local topic in the
> >> > primary will be replicated. In addition, the consumer group
> information
> >> of
> >> > primary will be also replicated.
> >> >
> >> > B) Kafka primary cluster is not available. Producers are moved to
> >> produce
> >> > into the topic1 that it was manually created. In addition, consumers
> >> need
> >> > to connect to
> >> > secondary to consume the local topic "topic1" where the producers are
> >> now
> >> > producing and from the remote topic  "primary.topic1" where the
> >> producers
> >> > were producing before, i.e., consumers will need to aggregate.This is
> so
> >> > because some consumers could have lag so they will need to consume
> from
> >> > both. In this situation, local topic "topic1" in the secondary will be
> >> > modified with new messages and will be consumed (its consumption
> >> > information will also change) but the remote topic "primary.topic1"
> will
> >> > not receive new messages but it will be consumed  (its consumption
> >> > information will change)
> >> >
> >> > At this point, my conclusion is that consumers needs to consume from
> >> both
> >> > topics (the new messages produced in the local topic and the old
> >> messages
> >> > for consumers that had a lag)
> >> >
> >> > C) primary cluster is recovered (here is when the things get
> complicated
> >> > for me). In the talk, the new primary is renamed a primary-2 and the
> >> MM2 is
> >> > configured to active-active replication.
> >> > The result is the following. The secondary cluster will end up with a
> >> new
> >> > remote topic (primary-2.topic1) that will contain a replica of the new
> >> > topic1 created in the primary-2 cluster. The primary-2 cluster will
> >> have 3
> >> > topics. "topic1" will be a new topic where in the near future
> producers
> >> > will produce, "secondary.topic1" contains the replica of the local
> topic
> >> > "topic1" in the secondary and "secondary.primary.topic1" that is
> >> "topic1"
> >> > of the old primary (got through the secondary).
> >> >
> >> > D) Once all the replicas are in sync, producers and consumers will be
> >> moved
> >> > to the primary-2. Producers will produce to local topic "topic1" of
> >> > primary-2 cluster. The consumers
> >> > will connect to primary-2 to consume from "topic1" (new messages that
> >> come
> >> > in), "secondary.topic1" (messages produced during the outage) and from
> >> > "secondary.primary.topic1" (old messages)
> >> >
> >> > If topics have a retention time, e.g. 7 days, we could remove
> >> > "secondary.primary.topic1" after a few days, leaving the situation as
> at
> >> > the beginning. However, if another problem happens in the middle, the
> >> > number of topics could be a little difficult to handle.
> >> >
> >> > An additional question. If the topic is compacted, i.e.., the topic
> >> keeps
> >> > forever, does switchover operations would imply add an additional path
> >> in
> >> > the topic name?
> >> >
> >> > I would appreciate some guidance with this.
> >> >
> >> > Regards
> >> >
> >>
> >
>

Reply via email to