Re: [DISCUSS] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

ShunKang Lin Wed, 28 Sep 2022 08:35:51 -0700

Hi Divij Vaidya,

Thanks for your comments.

1. I checked the code of KafkaProducer#doSend()
and RecordAccumulator#append(), if KafkaProducer#doSend() returns it means
serializedKey and serializedValue have been appended to
ProducerBatch#recordsBuilder and we don't keep reference of serializedKey
and serializedValue.

2. According to 1, the user application can reuse the ByteBuffer to send
consecutive KafkaProducer#send() requests without breaking the user
application. If we are concerned about compatibility, we can provide
another Serializer, such as ZeroCopyByteBufferSerializer, and keep the
original ByteBufferSerializer unchanged.

In my opinion, kafka-clients should provide some way for users who want to
improve application performance, if users want to improve application
performance, they should use lower level code and understand the underlying
implementation of these codes.

Best,
ShunKang

Divij Vaidya <divijvaidy...@gmail.com> 于2022年9月28日周三 19:58写道：

> Hello
>
> I believe that the current behaviour of creating a copy of the user
> provided input is the correct approach because of the following reasons:
>
> 1. In the existing implementation (considering cases when T is ByteBuffer
> in Serializer#serialize(String,Headers,T)) we copy the data (T) into a new
> byte[]. In the new approach, we would continue to re-use the ByteBuffer
> even after doSend() which means the `ProducerRecord` object cannot go out
> of scope from a GC perspective at the end of doSend(). Hence, the new
> approach may lead to increased heap memory usage for a greater period of
> time.
>
> 2. The new approach may break certain user applications e.g. consider an
> user application which re-uses the ByteBuffer (maybe it's a memory mapped
> byte buffer) to send consecutive Producer.send() requests. Prior to this
> change, they could do that because we copy the data from user provided
> input before storing it in the accumulator but after this change, they will
> have to allocate a new ByteBuffer for every ProduceRecord.
>
> In general, I am of the opinion that any user provided data should be
> copied to internal data structures at the interface of an opaque library
> (like client) so that the user doesn't have to guess about memory lifetime
> of the objects they provided to the opaque library.
>
> What do you think?
>
> --
> Divij Vaidya
>
>
>
> On Sun, Sep 25, 2022 at 5:59 PM ShunKang Lin <linshunkang....@gmail.com>
> wrote:
>
> > Hi all, I'd like to start a new discussion thread on KIP-872 (Kafka
> Client)
> > which proposes that add Serializer#serializeToByteBuffer() to reduce
> memory
> > copying.
> >
> > KIP:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
> > Thanks, ShunKang
> >
>

Re: [DISCUSS] KIP-872: Add Serializer#serializeToByteBuffer() to reduce memory copying

Reply via email to