To amortize the long RTT across data centers, you can tune the TCP window size by configuring a larger socket.receive.buffer.bytes in the consumer.
For the last one, it seems that you want identical mirroring. The tricky thing is to figure out how to avoid duplicates when there is a failure. We had some related discussion in the context of transactional messaging ( https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka ). Thanks, Jun On Tue, Mar 24, 2015 at 11:44 AM, vlad...@gmail.com <vlad...@gmail.com> wrote: > Dear all, > > I had a short discussion with Jay yesterday at the ACM meetup and he > suggested writing an email regarding a few possible MirrorMaker > improvements. > > At Turn, we have been using MirrorMaker for a a few months now to > asynchronously replicate our key/value store data between our datacenters. > In a way, our system is similar to Linkedin's Databus, but it uses Kafka > clusters and MirrorMaker as its building blocks. Our overall message rate > peaks at about 650K/sec and, when pushing data over high bandwidth delay > product links, we have found some minor bottlenecks. > > The MirrorMaker process uses a standard consumer to pull data from a remote > datacenter. This implies that it opens a single TCP connection to each of > the remote brokers and muxes requests for different topics and partitions > over this connection. While this is a good thing in terms of maintaining > the congestion window open, over long RTT lines with rather high loss rate > the congestion window will cap, in our case at just a few Mbps. While the > overall line bandwidth is much higher, this means that we have to start > multiple MirrorMaker processes (somewhere in the hundreds), in order to > completely use the line capacity. Being able to pool multiple TCP > connections from a single consumer to a broker would solve this > complication. > > The standard consumer also uses the remote ZooKeeper in order to manage the > consumer group. While consumer group management is moving closer to the > brokers, it might make sense to move the group management to the local > datacenter, since that would avoid using the long-distance connection for > this purpose. > > Another possible improvement assumes a further constraint, namely that the > number of partitions for a topic in both datacenters is the same. In my > opinion, this is a sane constraint, since it preserves the Kafka ordering > guarantees (per partition), instead of a simple guarantee per key. This > kind of guarantee can be for example useful in a system that compares > partition contents to reach eventual consistency using Merkle trees. If the > number of partitions is equal, then offsets have the same meaning for the > same partition in both clusters, since the data for both partitions is > identical before the offset. This allows a simple consumer to just inquire > the local broker and the remote broker for their current offsets and, in > case the remote broker is ahead, copy the extra data to the local cluster. > Since the consumer offsets are no longer bound to the specific partitioning > of a single remote cluster, the consumer could pull from one of any number > of remote clusters, BitTorrent-style, if their offsets are ahead of the > local offset. The group management problem would reduce to assigning local > partitions to different MirrorMaker processes, so the group management > could be done locally also in this situation. > > Regards, > Vlad >