Hey Henry, great KIP. The performance improvements are impressive! However, often cpu, ram, gc are not the metrics most important to a replication pipeline -- often the network is mostly saturated anyway. Do you know how this change affects latency or thruput? I suspect less GC pressure means slightly less p99 latency, but it would be great to see that confirmed. I don't think it's necessary that this KIP improves these metrics, but I think it's important to show that they at least aren't made worse.
I suspect any improvement in MM1 would be magnified in MM2, given there is a lot more machinery between consumer and producer in MM2. I'd like to do some performance analysis based on these changes. Looking forward to a PR! Ryanne On Wed, Feb 10, 2021, 3:50 PM Henry Cai <h...@pinterest.com> wrote: > On the question "whether shallow mirror is only applied on mirror maker > v1", the code change is mostly on consumer and producer code path, the > change to mirrormaker v1 is very trivial. We chose to modify the > consumer/producer path (instead of creating a new mirror product) so other > use cases can use that feature as well. The change to mirror maker v2 > should be straightforward as well but we don't have that environment in > house. I think the community can easily port this change to mirror maker > v2. > > > > On Wed, Feb 10, 2021 at 12:58 PM Vahid Hashemian < > vahid.hashem...@gmail.com> wrote: > >> Retitled the thread to conform to the common format. >> >> On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang <ning2008w...@gmail.com> wrote: >> >> > Hello Henry, >> > >> > This is a very interesting proposal. >> > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar >> > concern of re-compressing data in mirror maker. >> > >> > Probably one thing may need to clarify is: how "shallow" mirroring is >> only >> > applied to mirrormaker use case, if the changes need to be made on >> generic >> > consumer and producer (e.g. by adding `fetch.raw.bytes` and >> > `send.raw.bytes` to producer and consumer config) >> > >> > On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote: >> > > Dear Community members, >> > > >> > > We are proposing a new feature to improve the performance of Kafka >> mirror >> > > maker: >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring >> > > >> > > The current Kafka MirrorMaker process (with the underlying Consumer >> and >> > > Producer library) uses significant CPU cycles and memory to >> > > decompress/recompress, deserialize/re-serialize messages and copy >> > multiple >> > > times of messages bytes along the mirroring/replicating stages. >> > > >> > > The KIP proposes a *shallow mirror* feature which brings back the >> shallow >> > > iterator concept to the mirror process and also proposes to skip the >> > > unnecessary message decompression and recompression steps. We argue >> in >> > > many cases users just want a simple replication pipeline to replicate >> the >> > > message as it is from the source cluster to the destination cluster. >> In >> > > many cases the messages in the source cluster are already compressed >> and >> > > properly batched, users just need an identical copy of the message >> bytes >> > > through the mirroring without any transformation or repartitioning. >> > > >> > > We have a prototype implementation in house with MirrorMaker v1 and >> > > observed *CPU usage dropped from 50% to 15%* for some mirror >> pipelines. >> > > >> > > We name this feature: *shallow mirroring* since it has some >> resemblance >> > to >> > > the old Kafka 0.7 namesake feature but the implementations are not >> quite >> > > the same. ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches >> > inside >> > > MemoryRecords structure instead of deep iterating records inside >> > > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer >> > > instead of deep copying and deserializing bytes into objects. >> > > >> > > Please share discussions/feedback along this email thread. >> > > >> > >> >> >> -- >> >> Thanks! >> --Vahid >> >