Re: kafka mirrormaker cross datacenter replication

Kane Kim Sun, 22 Mar 2015 21:41:55 -0700

I thought that ordering is guaranteed within the partition or mirror maker
doesn't preserve partitions?




On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> I think 1) will work, but not sure if about 2), since messages replicated
> at two clusters may be out of order as well, hence you may get message
> 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your
> latest message processed in the first cluster is 2, when you fail over to
> the other cluster you may skip and miss message 3 and 4.
>
> Guozhang
>
> On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <kane.ist...@gmail.com> wrote:
>
> > Also, as I understand we either have to mark all messages with unique IDs
> > and then deduplicate them, or, if we want just store last message
> processed
> > per partition we will need exactly the same partitions number in both
> > clusters?
> >
> > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> >
> > > Not sure if transactional messaging will help in this case, as at least
> > for
> > > now it is still targeted within a single DC, i.e. a "transaction" is
> only
> > > defined within a Kafka cluster, not across clusters.
> > >
> > > Guozhang
> > >
> > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst <
> > > jbringhu...@linkedin.com.invalid> wrote:
> > >
> > > > Hey Kane,
> > > >
> > > > When mirrormakers loose offsets on catastrophic failure, you
> generally
> > > > have two options. You can keep auto.offset.reset set to "latest" and
> > > handle
> > > > the loss of messages, or you can have it set to "earliest" and handle
> > the
> > > > duplication of messages.
> > > >
> > > > Although we try to avoid duplicate messages overall, when failure
> > > happens,
> > > > we (mostly) take the "earliest" path and deal with the duplication of
> > > > messages.
> > > >
> > > > If your application doesn't treat messages as idempotent, you might
> be
> > > > able to get away with something like couchbase or memcached with a
> TTL
> > > > slightly higher than your Kafka retention time and use that to filter
> > > > duplicates. Another pattern may be to deduplicate messages in Hadoop
> > > before
> > > > taking action on them.
> > > >
> > > > -Jon
> > > >
> > > > P.S. An option in the future might be
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> > > >
> > > > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > What's the best strategy for failover when using mirror-maker to
> > > > replicate
> > > > > across datacenters? As I understand offsets in both datacenters
> will
> > be
> > > > > different, how consumers should be reconfigured to continue reading
> > > from
> > > > > the same point where they stopped without data loss and/or
> > duplication?
> > > > >
> > > > > Thanks.
> > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: kafka mirrormaker cross datacenter replication

Reply via email to