Dear Community members,

We are proposing a new feature to improve the performance of Kafka mirror
maker:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring

The current Kafka MirrorMaker process (with the underlying Consumer and
Producer library) uses significant CPU cycles and memory to
decompress/recompress, deserialize/re-serialize messages and copy multiple
times of messages bytes along the mirroring/replicating stages.

The KIP proposes a *shallow mirror* feature which brings back the shallow
iterator concept to the mirror process and also proposes to skip the
unnecessary message decompression and recompression steps.  We argue in
many cases users just want a simple replication pipeline to replicate the
message as it is from the source cluster to the destination cluster.  In
many cases the messages in the source cluster are already compressed and
properly batched, users just need an identical copy of the message bytes
through the mirroring without any transformation or repartitioning.

We have a prototype implementation in house with MirrorMaker v1 and
observed *CPU usage dropped from 50% to 15%* for some mirror pipelines.

We name this feature: *shallow mirroring* since it has some resemblance to
the old Kafka 0.7 namesake feature but the implementations are not quite
the same.  ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches inside
MemoryRecords structure instead of deep iterating records inside
RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer
instead of deep copying and deserializing bytes into objects.

Please share discussions/feedback along this email thread.

Reply via email to