Re: kafka mirrormaker cross datacenter replication

Jon Bringhurst Fri, 20 Mar 2015 10:11:08 -0700

Hey Kane,

When mirrormakers loose offsets on catastrophic failure, you generally have two 
options. You can keep auto.offset.reset set to "latest" and handle the loss of 
messages, or you can have it set to "earliest" and handle the duplication of 
messages.

Although we try to avoid duplicate messages overall, when failure happens, we 
(mostly) take the "earliest" path and deal with the duplication of messages.

If your application doesn't treat messages as idempotent, you might be able to 
get away with something like couchbase or memcached with a TTL slightly higher 
than your Kafka retention time and use that to filter duplicates. Another 
pattern may be to deduplicate messages in Hadoop before taking action on them.

-Jon

P.S. An option in the future might be 
https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka

On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote:

> Hello,
> 
> What's the best strategy for failover when using mirror-maker to replicate
> across datacenters? As I understand offsets in both datacenters will be
> different, how consumers should be reconfigured to continue reading from
> the same point where they stopped without data loss and/or duplication?
> 
> Thanks.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: kafka mirrormaker cross datacenter replication

Reply via email to