With MM, the source and destination cluster can choose different number of partitions for the mirrored topic, and hence messages may be re-grouped in the destination cluster. In addition, let's say you have two MMs piping data to the same destination from two sources, the ordering of which messages from the two source clusters arrive to the destination is non-deterministic as well.
Guozhang On Sun, Mar 22, 2015 at 9:40 PM, Kane Kim <kane.ist...@gmail.com> wrote: > I thought that ordering is guaranteed within the partition or mirror maker > doesn't preserve partitions? > > > > On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > I think 1) will work, but not sure if about 2), since messages replicated > > at two clusters may be out of order as well, hence you may get message > > 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your > > latest message processed in the first cluster is 2, when you fail over to > > the other cluster you may skip and miss message 3 and 4. > > > > Guozhang > > > > On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <kane.ist...@gmail.com> wrote: > > > > > Also, as I understand we either have to mark all messages with unique > IDs > > > and then deduplicate them, or, if we want just store last message > > processed > > > per partition we will need exactly the same partitions number in both > > > clusters? > > > > > > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > Not sure if transactional messaging will help in this case, as at > least > > > for > > > > now it is still targeted within a single DC, i.e. a "transaction" is > > only > > > > defined within a Kafka cluster, not across clusters. > > > > > > > > Guozhang > > > > > > > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst < > > > > jbringhu...@linkedin.com.invalid> wrote: > > > > > > > > > Hey Kane, > > > > > > > > > > When mirrormakers loose offsets on catastrophic failure, you > > generally > > > > > have two options. You can keep auto.offset.reset set to "latest" > and > > > > handle > > > > > the loss of messages, or you can have it set to "earliest" and > handle > > > the > > > > > duplication of messages. > > > > > > > > > > Although we try to avoid duplicate messages overall, when failure > > > > happens, > > > > > we (mostly) take the "earliest" path and deal with the duplication > of > > > > > messages. > > > > > > > > > > If your application doesn't treat messages as idempotent, you might > > be > > > > > able to get away with something like couchbase or memcached with a > > TTL > > > > > slightly higher than your Kafka retention time and use that to > filter > > > > > duplicates. Another pattern may be to deduplicate messages in > Hadoop > > > > before > > > > > taking action on them. > > > > > > > > > > -Jon > > > > > > > > > > P.S. An option in the future might be > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka > > > > > > > > > > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > What's the best strategy for failover when using mirror-maker to > > > > > replicate > > > > > > across datacenters? As I understand offsets in both datacenters > > will > > > be > > > > > > different, how consumers should be reconfigured to continue > reading > > > > from > > > > > > the same point where they stopped without data loss and/or > > > duplication? > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang