Hello

I believe that the current behaviour of creating a copy of the user
provided input is the correct approach because of the following reasons:

1. In the existing implementation (considering cases when T is ByteBuffer
in Serializer#serialize(String,Headers,T)) we copy the data (T) into a new
byte[]. In the new approach, we would continue to re-use the ByteBuffer
even after doSend() which means the `ProducerRecord` object cannot go out
of scope from a GC perspective at the end of doSend(). Hence, the new
approach may lead to increased heap memory usage for a greater period of
time.

2. The new approach may break certain user applications e.g. consider an
user application which re-uses the ByteBuffer (maybe it's a memory mapped
byte buffer) to send consecutive Producer.send() requests. Prior to this
change, they could do that because we copy the data from user provided
input before storing it in the accumulator but after this change, they will
have to allocate a new ByteBuffer for every ProduceRecord.

In general, I am of the opinion that any user provided data should be
copied to internal data structures at the interface of an opaque library
(like client) so that the user doesn't have to guess about memory lifetime
of the objects they provided to the opaque library.

What do you think?

--
Divij Vaidya



On Sun, Sep 25, 2022 at 5:59 PM ShunKang Lin <linshunkang....@gmail.com>
wrote:

> Hi all, I'd like to start a new discussion thread on KIP-872 (Kafka Client)
> which proposes that add Serializer#serializeToByteBuffer() to reduce memory
> copying.
>
> KIP:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228495828
> Thanks, ShunKang
>

Reply via email to