Radai, Thanks for the proposal. A couple of comments on this.
1. Since we store request objects in the request queue, how do we get an accurate size estimate for those requests? 2. Currently, it's bad if the processor blocks on adding a request to the request queue. Once blocked, the processor can't process the sending of responses of other socket keys either. This will cause all clients in this processor with an outstanding request to eventually timeout. Typically, this will trigger client-side retries, which will add more load on the broker and cause potentially more congestion in the request queue. With queued.max.requests, to prevent blocking on the request queue, our recommendation is to configure queued.max.requests to be the same as the number of socket connections on the broker. Since the broker never processes more than 1 request per connection at a time, the request queue will never be blocked. With queued.max.bytes, it's going to be harder to configure the value properly to prevent blocking. So, while adding queued.max.bytes is potentially useful for memory management, for it to be truly useful, we probably need to address the processor blocking issue for it to be really useful in practice. One possibility is to put back-pressure to the client when the request queue is blocked. For example, if the processor notices that the request queue is full, it can turn off the interest bit for read for all socket keys. This will allow the processor to continue handling responses. When the request queue has space again, it can indicate the new state to the process and wake up the selector. Not sure how this will work with multiple processors though since the request queue is shared across all processors. Thanks, Jun On Thu, Aug 4, 2016 at 11:28 AM, radai <radai.rosenbl...@gmail.com> wrote: > Hello, > > I'd like to initiate a discussion about > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 72%3A+Allow+Sizing+Incoming+Request+Queue+in+Bytes > > The goal of the KIP is to allow configuring a bound on the capacity (as in > bytes of memory used) of the incoming request queue, in addition to the > current bound on the number of messages. > > This comes after several incidents at Linkedin where a sudden "spike" of > large message batches caused an out of memory exception. > > Thank you, > > Radai >