Re: kafka mirrormaker cross datacenter replication

Guozhang Wang Mon, 23 Mar 2015 08:42:17 -0700

With MM, the source and destination cluster can choose different number of
partitions for the mirrored topic, and hence messages may be re-grouped in
the destination cluster. In addition, let's say you have two MMs piping
data to the same destination from two sources, the ordering of which
messages from the two source clusters arrive to the destination is
non-deterministic as well.


Guozhang

On Sun, Mar 22, 2015 at 9:40 PM, Kane Kim <kane.ist...@gmail.com> wrote:

> I thought that ordering is guaranteed within the partition or mirror maker
> doesn't preserve partitions?
>
>
>
> On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > I think 1) will work, but not sure if about 2), since messages replicated
> > at two clusters may be out of order as well, hence you may get message
> > 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your
> > latest message processed in the first cluster is 2, when you fail over to
> > the other cluster you may skip and miss message 3 and 4.
> >
> > Guozhang
> >
> > On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim <kane.ist...@gmail.com> wrote:
> >
> > > Also, as I understand we either have to mark all messages with unique
> IDs
> > > and then deduplicate them, or, if we want just store last message
> > processed
> > > per partition we will need exactly the same partitions number in both
> > > clusters?
> > >
> > > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > >
> > > > Not sure if transactional messaging will help in this case, as at
> least
> > > for
> > > > now it is still targeted within a single DC, i.e. a "transaction" is
> > only
> > > > defined within a Kafka cluster, not across clusters.
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst <
> > > > jbringhu...@linkedin.com.invalid> wrote:
> > > >
> > > > > Hey Kane,
> > > > >
> > > > > When mirrormakers loose offsets on catastrophic failure, you
> > generally
> > > > > have two options. You can keep auto.offset.reset set to "latest"
> and
> > > > handle
> > > > > the loss of messages, or you can have it set to "earliest" and
> handle
> > > the
> > > > > duplication of messages.
> > > > >
> > > > > Although we try to avoid duplicate messages overall, when failure
> > > > happens,
> > > > > we (mostly) take the "earliest" path and deal with the duplication
> of
> > > > > messages.
> > > > >
> > > > > If your application doesn't treat messages as idempotent, you might
> > be
> > > > > able to get away with something like couchbase or memcached with a
> > TTL
> > > > > slightly higher than your Kafka retention time and use that to
> filter
> > > > > duplicates. Another pattern may be to deduplicate messages in
> Hadoop
> > > > before
> > > > > taking action on them.
> > > > >
> > > > > -Jon
> > > > >
> > > > > P.S. An option in the future might be
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> > > > >
> > > > > On Mar 19, 2015, at 5:32 PM, Kane Kim <kane.ist...@gmail.com>
> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > What's the best strategy for failover when using mirror-maker to
> > > > > replicate
> > > > > > across datacenters? As I understand offsets in both datacenters
> > will
> > > be
> > > > > > different, how consumers should be reconfigured to continue
> reading
> > > > from
> > > > > > the same point where they stopped without data loss and/or
> > > duplication?
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: kafka mirrormaker cross datacenter replication

Reply via email to