Hello Henry, This is a very interesting proposal. https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar concern of re-compressing data in mirror maker.
Probably one thing may need to clarify is: how "shallow" mirroring is only applied to mirrormaker use case, if the changes need to be made on generic consumer and producer (e.g. by adding `fetch.raw.bytes` and `send.raw.bytes` to producer and consumer config) On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote: > Dear Community members, > > We are proposing a new feature to improve the performance of Kafka mirror > maker: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring > > The current Kafka MirrorMaker process (with the underlying Consumer and > Producer library) uses significant CPU cycles and memory to > decompress/recompress, deserialize/re-serialize messages and copy multiple > times of messages bytes along the mirroring/replicating stages. > > The KIP proposes a *shallow mirror* feature which brings back the shallow > iterator concept to the mirror process and also proposes to skip the > unnecessary message decompression and recompression steps. We argue in > many cases users just want a simple replication pipeline to replicate the > message as it is from the source cluster to the destination cluster. In > many cases the messages in the source cluster are already compressed and > properly batched, users just need an identical copy of the message bytes > through the mirroring without any transformation or repartitioning. > > We have a prototype implementation in house with MirrorMaker v1 and > observed *CPU usage dropped from 50% to 15%* for some mirror pipelines. > > We name this feature: *shallow mirroring* since it has some resemblance to > the old Kafka 0.7 namesake feature but the implementations are not quite > the same. ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches inside > MemoryRecords structure instead of deep iterating records inside > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer > instead of deep copying and deserializing bytes into objects. > > Please share discussions/feedback along this email thread. >