[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

Jay Kreps (JIRA) Fri, 27 Mar 2015 09:31:33 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384100#comment-14384100
 ]


Jay Kreps commented on KAFKA-2045:
----------------------------------

[~rzidane] I do think a prototype would help show whether there was a real perf 
gain here or not. Given the other cpu expenses we have like CRC checks it's not 
obvious that memory will be an issue, but you just never know till someone 
tries.

I think one approach that might be simpler would just be to pool the 
ByteBuffers rather than trying to force it to be the case that there is exactly 
one. The pool will be super trivial since the requests will all be 
approximately the same size (so just an ArrayList). We don't need to try to 
make the pool block or support threads or anything like that.

We currently have this api:
{code}
  ConsumerRecords recs = consumer.poll(100);
{code}

We would add another version of that api to facilitate reuse:
{code}
  ConsumerRecords recs = consumer.poll(100, recs);
{code}
The second parameter is a ConsumerRecords instance that the client is 
"recycling". If recs=null this api is the same as the current poll api. If 
non-null we would grab the underlying ByteBuffers from the ConsumerRecords 
instance and add it to our pool for reuse.

I think this would allow both lazy deserialization (which I suspect on its own 
is enough to avoid issues with the ConsumerRecords) and reuse at the network 
level as we read requests.

It is true that the memory bound is a little loose since you can have two 
requests at any time (one being read and one being given out to the consumer), 
but that is fine.

Thoughts?

> Memory Management on the consumer
> ---------------------------------
>
>                 Key: KAFKA-2045
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2045
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Guozhang Wang
>
> We need to add the memory management on the new consumer like we did in the 
> new producer. This would probably include:
> 1. byte buffer re-usage for fetch response partition data.
> 2. byte buffer re-usage for on-the-fly de-compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

Reply via email to