Hello Henry,

This is a very interesting proposal. 
https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar concern 
of re-compressing data in mirror maker.

Probably one thing may need to clarify is: how "shallow" mirroring is only 
applied to mirrormaker use case, if the changes need to be made on generic 
consumer and producer (e.g. by adding `fetch.raw.bytes` and `send.raw.bytes` to 
producer and consumer config)

On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote: 
> Dear Community members,
> 
> We are proposing a new feature to improve the performance of Kafka mirror
> maker:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring
> 
> The current Kafka MirrorMaker process (with the underlying Consumer and
> Producer library) uses significant CPU cycles and memory to
> decompress/recompress, deserialize/re-serialize messages and copy multiple
> times of messages bytes along the mirroring/replicating stages.
> 
> The KIP proposes a *shallow mirror* feature which brings back the shallow
> iterator concept to the mirror process and also proposes to skip the
> unnecessary message decompression and recompression steps.  We argue in
> many cases users just want a simple replication pipeline to replicate the
> message as it is from the source cluster to the destination cluster.  In
> many cases the messages in the source cluster are already compressed and
> properly batched, users just need an identical copy of the message bytes
> through the mirroring without any transformation or repartitioning.
> 
> We have a prototype implementation in house with MirrorMaker v1 and
> observed *CPU usage dropped from 50% to 15%* for some mirror pipelines.
> 
> We name this feature: *shallow mirroring* since it has some resemblance to
> the old Kafka 0.7 namesake feature but the implementations are not quite
> the same.  ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches inside
> MemoryRecords structure instead of deep iterating records inside
> RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer
> instead of deep copying and deserializing bytes into objects.
> 
> Please share discussions/feedback along this email thread.
> 

Reply via email to