MirrorMaker improvements

vlad...@gmail.com Tue, 24 Mar 2015 12:37:22 -0700

Dear all,

I had a short discussion with Jay yesterday at the ACM meetup and he
suggested writing an email regarding a few possible MirrorMaker
improvements.


At Turn, we have been using MirrorMaker for a a few months now to
asynchronously replicate our key/value store data between our datacenters.
In a way, our system is similar to Linkedin's Databus, but it uses Kafka
clusters and MirrorMaker as its building blocks. Our overall message rate
peaks at about 650K/sec and, when pushing data over high bandwidth delay
product links, we have found some minor bottlenecks.

The MirrorMaker process uses a standard consumer to pull data from a remote
datacenter. This implies that it opens a single TCP connection to each of
the remote brokers and muxes requests for different topics and partitions
over this connection. While this is a good thing in terms of maintaining
the congestion window open, over long RTT lines with rather high loss rate
the congestion window will cap, in our case at just a few Mbps. While the
overall line bandwidth is much higher, this means that we have to start
multiple MirrorMaker processes (somewhere in the hundreds), in order to
completely use the line capacity. Being able to pool multiple TCP
connections from a single consumer to a broker would solve this
complication.

The standard consumer also uses the remote ZooKeeper in order to manage the
consumer group. While consumer group management is moving closer to the
brokers, it might make sense to move the group management to the local
datacenter, since that would avoid using the long-distance connection for
this purpose.

Another possible improvement assumes a further constraint, namely that the
number of partitions for a topic in both datacenters is the same. In my
opinion, this is a sane constraint, since it preserves the Kafka ordering
guarantees (per partition), instead of a simple guarantee per key. This
kind of guarantee can be for example useful in a system that compares
partition contents to reach eventual consistency using Merkle trees. If the
number of partitions is equal, then offsets have the same meaning for the
same partition in both clusters, since the data for both partitions is
identical before the offset. This allows a simple consumer to just inquire
the local broker and the remote broker for their current offsets and, in
case the remote broker is ahead, copy the extra data to the local cluster.
Since the consumer offsets are no longer bound to the specific partitioning
of a single remote cluster, the consumer could pull from one of any number
of remote clusters, BitTorrent-style, if their offsets are ahead of the
local offset. The group management problem would reduce to assigning local
partitions to different MirrorMaker processes, so the group management
could be done locally also in this situation.

Regards,
Vlad

MirrorMaker improvements

Reply via email to