Hey Kane, When mirrormakers loose offsets on catastrophic failure, you generally have two options. You can keep auto.offset.reset set to "latest" and handle the loss of messages, or you can have it set to "earliest" and handle the duplication of messages.
Although we try to avoid duplicate messages overall, when failure happens, we (mostly) take the "earliest" path and deal with the duplication of messages. If your application doesn't treat messages as idempotent, you might be able to get away with something like couchbase or memcached with a TTL slightly higher than your Kafka retention time and use that to filter duplicates. Another pattern may be to deduplicate messages in Hadoop before taking action on them. -Jon P.S. An option in the future might be https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote: > Hello, > > What's the best strategy for failover when using mirror-maker to replicate > across datacenters? As I understand offsets in both datacenters will be > different, how consumers should be reconfigured to continue reading from > the same point where they stopped without data loss and/or duplication? > > Thanks.
signature.asc
Description: Message signed with OpenPGP using GPGMail