I think 1) will work, but not sure if about 2), since messages replicated at two clusters may be out of order as well, hence you may get message 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your latest message processed in the first cluster is 2, when you fail over to the other cluster you may skip and miss message 3 and 4.
Guozhang On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <kane.ist...@gmail.com> wrote: > Also, as I understand we either have to mark all messages with unique IDs > and then deduplicate them, or, if we want just store last message processed > per partition we will need exactly the same partitions number in both > clusters? > > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > Not sure if transactional messaging will help in this case, as at least > for > > now it is still targeted within a single DC, i.e. a "transaction" is only > > defined within a Kafka cluster, not across clusters. > > > > Guozhang > > > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst < > > jbringhu...@linkedin.com.invalid> wrote: > > > > > Hey Kane, > > > > > > When mirrormakers loose offsets on catastrophic failure, you generally > > > have two options. You can keep auto.offset.reset set to "latest" and > > handle > > > the loss of messages, or you can have it set to "earliest" and handle > the > > > duplication of messages. > > > > > > Although we try to avoid duplicate messages overall, when failure > > happens, > > > we (mostly) take the "earliest" path and deal with the duplication of > > > messages. > > > > > > If your application doesn't treat messages as idempotent, you might be > > > able to get away with something like couchbase or memcached with a TTL > > > slightly higher than your Kafka retention time and use that to filter > > > duplicates. Another pattern may be to deduplicate messages in Hadoop > > before > > > taking action on them. > > > > > > -Jon > > > > > > P.S. An option in the future might be > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka > > > > > > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote: > > > > > > > Hello, > > > > > > > > What's the best strategy for failover when using mirror-maker to > > > replicate > > > > across datacenters? As I understand offsets in both datacenters will > be > > > > different, how consumers should be reconfigured to continue reading > > from > > > > the same point where they stopped without data loss and/or > duplication? > > > > > > > > Thanks. > > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang