I thought that ordering is guaranteed within the partition or mirror maker doesn't preserve partitions?
On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang <wangg...@gmail.com> wrote: > I think 1) will work, but not sure if about 2), since messages replicated > at two clusters may be out of order as well, hence you may get message > 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your > latest message processed in the first cluster is 2, when you fail over to > the other cluster you may skip and miss message 3 and 4. > > Guozhang > > On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <kane.ist...@gmail.com> wrote: > > > Also, as I understand we either have to mark all messages with unique IDs > > and then deduplicate them, or, if we want just store last message > processed > > per partition we will need exactly the same partitions number in both > > clusters? > > > > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > Not sure if transactional messaging will help in this case, as at least > > for > > > now it is still targeted within a single DC, i.e. a "transaction" is > only > > > defined within a Kafka cluster, not across clusters. > > > > > > Guozhang > > > > > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst < > > > jbringhu...@linkedin.com.invalid> wrote: > > > > > > > Hey Kane, > > > > > > > > When mirrormakers loose offsets on catastrophic failure, you > generally > > > > have two options. You can keep auto.offset.reset set to "latest" and > > > handle > > > > the loss of messages, or you can have it set to "earliest" and handle > > the > > > > duplication of messages. > > > > > > > > Although we try to avoid duplicate messages overall, when failure > > > happens, > > > > we (mostly) take the "earliest" path and deal with the duplication of > > > > messages. > > > > > > > > If your application doesn't treat messages as idempotent, you might > be > > > > able to get away with something like couchbase or memcached with a > TTL > > > > slightly higher than your Kafka retention time and use that to filter > > > > duplicates. Another pattern may be to deduplicate messages in Hadoop > > > before > > > > taking action on them. > > > > > > > > -Jon > > > > > > > > P.S. An option in the future might be > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka > > > > > > > > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote: > > > > > > > > > Hello, > > > > > > > > > > What's the best strategy for failover when using mirror-maker to > > > > replicate > > > > > across datacenters? As I understand offsets in both datacenters > will > > be > > > > > different, how consumers should be reconfigured to continue reading > > > from > > > > > the same point where they stopped without data loss and/or > > duplication? > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >