Re: Kafka sending messages with zero copy

Jay Kreps Thu, 23 Oct 2014 17:20:57 -0700

It sounds like you are primarily interested in optimizing the producer?

There is no way to produce data without any allocation being done and I
think getting to that would be pretty hard and lead to bad apis, but
avoiding memory allocation entirely shouldn't be necessary. Small transient
objects in java are pretty cheap to allocate and deallocate. The new Kafka
producer API that is on trunk and will be in 0.8.2 is much more disciplined
in it's usage of memory though there is still some allocation. The goal is
to avoid copying the *data* multiple times, even if we do end up creating
some small helper objects along the way (the idea is that the data may be
rather large).

If you wanted to further optimize the new producer there are two things
that could be done that would help:
1. Avoid the copy when creating the ProducerRecord instance. This could be
done by accepting a length/offset along with the key and value and making
use of this when writing to the records instance. As it is your key and
value need to be complete byte arrays.
2. Avoid the copy during request serialization. This is a little trickier.
During request serialization we need to take the records for each partition
and create a request that contains all of them. It is possible to do this
with no further recopying of data but somewhat tricky.

My recommendation would be to try the new producer api and see how that
goes. If you need to optimize further we would definitely take patches for
(1) and (2).

-Jay

On Thu, Oct 23, 2014 at 4:03 PM, Rajiv Kurian <ra...@signalfuse.com> wrote:

> I have a flyweight style protocol that I use for my messages. Thus they
> require no serialization/deserialization to be processed. The messages are
> just offset, length pairs within a ByteBuffer.
>
> Is there a producer and consumer API that forgoes allocation? I just want
> to give the kakfa producer offsets from a ByteBuffer. Similarly it would be
> ideal if I could get a ByteBuffer and offsets into it from the consumer.
> Even if I could get byte arrays (implying a copy but no decoding phase) on
> the consumer that would be great. Right now it seems to me that the only
> way to get messages from Kafka is through a message object, which implies
> Kafka allocates these messages all the time. I am willing to use the
> upcoming 0.9 API too.
>
> Thanks.
>

Re: Kafka sending messages with zero copy

Reply via email to