Re: MirrorMaker improvements

2015-03-25 Thread Jiangjie Qin
You can set the num.consumer.fetchers to be a larger number (e.g. 3) and there will be more fetchers created to fetch from different partitions in the same broker. Each fetcher will have there own TCP connection. WRT the congestion window, yes, if the link has a high packet drop rate then congest

Re: MirrorMaker improvements

2015-03-25 Thread vlad...@gmail.com
Hi Jianqjie, I only noticed a single TCP connection between a MM process to a single broker. Is there something I could have done to open up more connections? TCP can actually cap before saturating the network, which is the reason for which it is hard to utilize a high bandwidth latency link with

Re: MirrorMaker improvements

2015-03-25 Thread Jiangjie Qin
Hi Vlad, I am not sure I understand the congestion window part. So TCP congestion control will only occur when you are saturating the network. If that is the case, bandwidth has already become the bottleneck. But we are talking about network under utilization, no? Another thing is that each fetch

Re: MirrorMaker improvements

2015-03-25 Thread vlad...@gmail.com
@Guozhang We actually have separate topics depending on the source of the message and the multicast distribution group (the set of destinations). Our topics are named: source_multicast-group. We do not aggregate data but we do static routing based on the destination and the destination set (that is

Re: MirrorMaker improvements

2015-03-25 Thread Jiangjie Qin
Hi Jun, I think bumping up socket.receive.buffer.bytes will help but might not be sufficient. There are actually two related factors here: 1. Pipelining TCP packets when send a single request/response. 2. Pipelining multiple requests/responses Bumping up socket.receive.buffer.bytes help with the 1)

Re: MirrorMaker improvements

2015-03-25 Thread Jun Rao
To amortize the long RTT across data centers, you can tune the TCP window size by configuring a larger socket.receive.buffer.bytes in the consumer. For the last one, it seems that you want identical mirroring. The tricky thing is to figure out how to avoid duplicates when there is a failure. We ha

Re: MirrorMaker improvements

2015-03-24 Thread Guozhang Wang
Thanks for sharing this Vlad, this is great read! I am particularly interested about the last bullet point of one-to-one mapping in MM since you also mentioned that you use Kafka MM as the async replication layer for your geo-replicated k-v store. One approach that we are pursuing here to support