Hey Henry, great KIP. The performance improvements are impressive! However,
often cpu, ram, gc are not the metrics most important to a replication
pipeline -- often the network is mostly saturated anyway. Do you know how
this change affects latency or thruput? I suspect less GC pressure means
slightly less p99 latency, but it would be great to see that confirmed. I
don't think it's necessary that this KIP improves these metrics, but I
think it's important to show that they at least aren't made worse.

I suspect any improvement in MM1 would be magnified in MM2, given there is
a lot more machinery between consumer and producer in MM2.


I'd like to do some performance analysis based on these changes. Looking
forward to a PR!

Ryanne

On Wed, Feb 10, 2021, 3:50 PM Henry Cai <h...@pinterest.com> wrote:

> On the question "whether shallow mirror is only applied on mirror maker
> v1", the code change is mostly on consumer and producer code path, the
> change to mirrormaker v1 is very trivial.  We chose to modify the
> consumer/producer path (instead of creating a new mirror product) so other
> use cases can use that feature as well.  The change to mirror maker v2
> should be straightforward as well but we don't have that environment in
> house.  I think the community can easily port this change to mirror maker
> v2.
>
>
>
> On Wed, Feb 10, 2021 at 12:58 PM Vahid Hashemian <
> vahid.hashem...@gmail.com> wrote:
>
>> Retitled the thread to conform to the common format.
>>
>> On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang <ning2008w...@gmail.com> wrote:
>>
>> > Hello Henry,
>> >
>> > This is a very interesting proposal.
>> > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar
>> > concern of re-compressing data in mirror maker.
>> >
>> > Probably one thing may need to clarify is: how "shallow" mirroring is
>> only
>> > applied to mirrormaker use case, if the changes need to be made on
>> generic
>> > consumer and producer (e.g. by adding `fetch.raw.bytes` and
>> > `send.raw.bytes` to producer and consumer config)
>> >
>> > On 2021/02/05 00:59:57, Henry Cai <h...@pinterest.com.INVALID> wrote:
>> > > Dear Community members,
>> > >
>> > > We are proposing a new feature to improve the performance of Kafka
>> mirror
>> > > maker:
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring
>> > >
>> > > The current Kafka MirrorMaker process (with the underlying Consumer
>> and
>> > > Producer library) uses significant CPU cycles and memory to
>> > > decompress/recompress, deserialize/re-serialize messages and copy
>> > multiple
>> > > times of messages bytes along the mirroring/replicating stages.
>> > >
>> > > The KIP proposes a *shallow mirror* feature which brings back the
>> shallow
>> > > iterator concept to the mirror process and also proposes to skip the
>> > > unnecessary message decompression and recompression steps.  We argue
>> in
>> > > many cases users just want a simple replication pipeline to replicate
>> the
>> > > message as it is from the source cluster to the destination cluster.
>> In
>> > > many cases the messages in the source cluster are already compressed
>> and
>> > > properly batched, users just need an identical copy of the message
>> bytes
>> > > through the mirroring without any transformation or repartitioning.
>> > >
>> > > We have a prototype implementation in house with MirrorMaker v1 and
>> > > observed *CPU usage dropped from 50% to 15%* for some mirror
>> pipelines.
>> > >
>> > > We name this feature: *shallow mirroring* since it has some
>> resemblance
>> > to
>> > > the old Kafka 0.7 namesake feature but the implementations are not
>> quite
>> > > the same.  ‘*Shallow*’ means 1. we *shallowly* iterate RecordBatches
>> > inside
>> > > MemoryRecords structure instead of deep iterating records inside
>> > > RecordBatch; 2. We *shallowly* copy (share) pointers inside ByteBuffer
>> > > instead of deep copying and deserializing bytes into objects.
>> > >
>> > > Please share discussions/feedback along this email thread.
>> > >
>> >
>>
>>
>> --
>>
>> Thanks!
>> --Vahid
>>
>

Reply via email to