Re: kafka mirrormaker cross datacenter replication

Guozhang Wang Fri, 20 Mar 2015 16:46:07 -0700

I think 1) will work, but not sure if about 2), since messages replicated
at two clusters may be out of order as well, hence you may get message
1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your
latest message processed in the first cluster is 2, when you fail over to
the other cluster you may skip and miss message 3 and 4.


Guozhang

On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <[email protected]> wrote:

> Also, as I understand we either have to mark all messages with unique IDs
> and then deduplicate them, or, if we want just store last message processed
> per partition we will need exactly the same partitions number in both
> clusters?
>
> On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <[email protected]>
> wrote:
>
> > Not sure if transactional messaging will help in this case, as at least
> for
> > now it is still targeted within a single DC, i.e. a "transaction" is only
> > defined within a Kafka cluster, not across clusters.
> >
> > Guozhang
> >
> > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst <
> > [email protected]> wrote:
> >
> > > Hey Kane,
> > >
> > > When mirrormakers loose offsets on catastrophic failure, you generally
> > > have two options. You can keep auto.offset.reset set to "latest" and
> > handle
> > > the loss of messages, or you can have it set to "earliest" and handle
> the
> > > duplication of messages.
> > >
> > > Although we try to avoid duplicate messages overall, when failure
> > happens,
> > > we (mostly) take the "earliest" path and deal with the duplication of
> > > messages.
> > >
> > > If your application doesn't treat messages as idempotent, you might be
> > > able to get away with something like couchbase or memcached with a TTL
> > > slightly higher than your Kafka retention time and use that to filter
> > > duplicates. Another pattern may be to deduplicate messages in Hadoop
> > before
> > > taking action on them.
> > >
> > > -Jon
> > >
> > > P.S. An option in the future might be
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> > >
> > > On Mar 19, 2015, at 5:32 PM, Kane Kim <[email protected]> wrote:
> > >
> > > > Hello,
> > > >
> > > > What's the best strategy for failover when using mirror-maker to
> > > replicate
> > > > across datacenters? As I understand offsets in both datacenters will
> be
> > > > different, how consumers should be reconfigured to continue reading
> > from
> > > > the same point where they stopped without data loss and/or
> duplication?
> > > >
> > > > Thanks.
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: kafka mirrormaker cross datacenter replication

Reply via email to