Dear all, I had a short discussion with Jay yesterday at the ACM meetup and he suggested writing an email regarding a few possible MirrorMaker improvements.
At Turn, we have been using MirrorMaker for a a few months now to asynchronously replicate our key/value store data between our datacenters. In a way, our system is similar to Linkedin's Databus, but it uses Kafka clusters and MirrorMaker as its building blocks. Our overall message rate peaks at about 650K/sec and, when pushing data over high bandwidth delay product links, we have found some minor bottlenecks. The MirrorMaker process uses a standard consumer to pull data from a remote datacenter. This implies that it opens a single TCP connection to each of the remote brokers and muxes requests for different topics and partitions over this connection. While this is a good thing in terms of maintaining the congestion window open, over long RTT lines with rather high loss rate the congestion window will cap, in our case at just a few Mbps. While the overall line bandwidth is much higher, this means that we have to start multiple MirrorMaker processes (somewhere in the hundreds), in order to completely use the line capacity. Being able to pool multiple TCP connections from a single consumer to a broker would solve this complication. The standard consumer also uses the remote ZooKeeper in order to manage the consumer group. While consumer group management is moving closer to the brokers, it might make sense to move the group management to the local datacenter, since that would avoid using the long-distance connection for this purpose. Another possible improvement assumes a further constraint, namely that the number of partitions for a topic in both datacenters is the same. In my opinion, this is a sane constraint, since it preserves the Kafka ordering guarantees (per partition), instead of a simple guarantee per key. This kind of guarantee can be for example useful in a system that compares partition contents to reach eventual consistency using Merkle trees. If the number of partitions is equal, then offsets have the same meaning for the same partition in both clusters, since the data for both partitions is identical before the offset. This allows a simple consumer to just inquire the local broker and the remote broker for their current offsets and, in case the remote broker is ahead, copy the extra data to the local cluster. Since the consumer offsets are no longer bound to the specific partitioning of a single remote cluster, the consumer could pull from one of any number of remote clusters, BitTorrent-style, if their offsets are ahead of the local offset. The group management problem would reduce to assigning local partitions to different MirrorMaker processes, so the group management could be done locally also in this situation. Regards, Vlad