Re: kafka mirrormaker cross datacenter replication

2015-03-23 Thread Guozhang Wang
With MM, the source and destination cluster can choose different number of partitions for the mirrored topic, and hence messages may be re-grouped in the destination cluster. In addition, let's say you have two MMs piping data to the same destination from two sources, the ordering of which messages

Re: kafka mirrormaker cross datacenter replication

2015-03-22 Thread Kane Kim
I thought that ordering is guaranteed within the partition or mirror maker doesn't preserve partitions? On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang wrote: > I think 1) will work, but not sure if about 2), since messages replicated > at two clusters may be out of order as well, hence you may

Re: kafka mirrormaker cross datacenter replication

2015-03-20 Thread Guozhang Wang
I think 1) will work, but not sure if about 2), since messages replicated at two clusters may be out of order as well, hence you may get message 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your latest message processed in the first cluster is 2, when you fail over to the oth

Re: kafka mirrormaker cross datacenter replication

2015-03-20 Thread Kane Kim
Also, as I understand we either have to mark all messages with unique IDs and then deduplicate them, or, if we want just store last message processed per partition we will need exactly the same partitions number in both clusters? On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang wrote: > Not sure

Re: kafka mirrormaker cross datacenter replication

2015-03-20 Thread Guozhang Wang
Not sure if transactional messaging will help in this case, as at least for now it is still targeted within a single DC, i.e. a "transaction" is only defined within a Kafka cluster, not across clusters. Guozhang On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst < jbringhu...@linkedin.com.invalid>

Re: kafka mirrormaker cross datacenter replication

2015-03-20 Thread Jon Bringhurst
Hey Kane, When mirrormakers loose offsets on catastrophic failure, you generally have two options. You can keep auto.offset.reset set to "latest" and handle the loss of messages, or you can have it set to "earliest" and handle the duplication of messages. Although we try to avoid duplicate mes

kafka mirrormaker cross datacenter replication

2015-03-19 Thread Kane Kim
Hello, What's the best strategy for failover when using mirror-maker to replicate across datacenters? As I understand offsets in both datacenters will be different, how consumers should be reconfigured to continue reading from the same point where they stopped without data loss and/or duplication?