Thanks for the KIP. I have a few high level comments: 1. Like Tom, I'm not convinced about the proposal to make this change to MirrorMaker 1 if we intend to deprecate it and remove it. I would rather us focus our efforts on the implementation we intend to support going forward. 2. The producer/consumer configs seem pretty dangerous for general usage, but the KIP doesn't address the potential downsides. 3. How does the ProducerRequest change impact exactly-once (if at all)? The change we are reverting was done as part of KIP-98. Have we considered the original reasons for the change?
Thanks, Ismael On Wed, Feb 10, 2021 at 12:58 PM Vahid Hashemian <vahid.hashem...@gmail.com> wrote: > Retitled the thread to conform to the common format. > > On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang <ning2008w...@gmail.com> wrote: > > > Hello Henry, > > > > This is a very interesting proposal. > > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar > > concern of re-compressing data in mirror maker. > > > > Probably one thing may need to clarify is: how "shallow" mirroring is > only > > applied to mirrormaker use case, if the changes need to be made on > generic > > consumer and producer (e.g. by adding `fetch.raw.bytes` and > > `send.raw.bytes` to producer and consumer config) > > > > On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote: > > > Dear Community members, > > > > > > We are proposing a new feature to improve the performance of Kafka > mirror > > > maker: > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring > > > > > > The current Kafka MirrorMaker process (with the underlying Consumer and > > > Producer library) uses significant CPU cycles and memory to > > > decompress/recompress, deserialize/re-serialize messages and copy > > multiple > > > times of messages bytes along the mirroring/replicating stages. > > > > > > The KIP proposes a *shallow mirror* feature which brings back the > shallow > > > iterator concept to the mirror process and also proposes to skip the > > > unnecessary message decompression and recompression steps. We argue in > > > many cases users just want a simple replication pipeline to replicate > the > > > message as it is from the source cluster to the destination cluster. > In > > > many cases the messages in the source cluster are already compressed > and > > > properly batched, users just need an identical copy of the message > bytes > > > through the mirroring without any transformation or repartitioning. > > > > > > We have a prototype implementation in house with MirrorMaker v1 and > > > observed *CPU usage dropped from 50% to 15%* for some mirror pipelines. > > > > > > We name this feature: *shallow mirroring* since it has some resemblance > > to > > > the old Kafka 0.7 namesake feature but the implementations are not > quite > > > the same. ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches > > inside > > > MemoryRecords structure instead of deep iterating records inside > > > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer > > > instead of deep copying and deserializing bytes into objects. > > > > > > Please share discussions/feedback along this email thread. > > > > > > > > -- > > Thanks! > --Vahid >