Thought about this again. If I understand correctly Jun's concern is about the cascading effect. Currently the processor will try to put all the requests received in one poll() call into the RequestChannel. This could potentially be long if the queue is moving really really slowly. If we don't mute the sockets and clients time out then reconnect, the processor may read more requests from the new sockets and take even longer to put them into the RequestChannel. So it would be good see if we can avoid this cascading effects.
On the other end, allowing the processor to keep reading requests from the sockets when the RequestChannel is full also seems hurting the memory usage control effort. Thanks, Jiangjie (Becket) Qin On Mon, Aug 8, 2016 at 4:46 PM, radai <radai.rosenbl...@gmail.com> wrote: > I agree that filling up the request queue can cause clients to time out > (and presumably retry?). However, for the workloads where we expect this > configuration to be useful the alternative is currently an OOM crash. > In my opinion an initial implementation of this feature could be > constrained to a simple drop-in replacement of ArrayBlockingQueue > (conditional, opt-in) and further study of behavior patterns under load can > drive future changes to the API later when those behaviors are better > understood (like back-pressure, nop filler responses to avoid client > timeouts or whatever). > > On Mon, Aug 8, 2016 at 2:23 PM, Mayuresh Gharat < > gharatmayures...@gmail.com> > wrote: > > > Nice write up Radai. > > I think what Jun said is a valid concern. > > If I am not wrong as per the proposal, we are depending on the entire > > pipeline to flow smoothly from accepting requests to handling it, calling > > KafkaApis and handing back the responses. > > > > Thanks, > > > > Mayuresh > > > > > > On Mon, Aug 8, 2016 at 12:22 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > > > > > > > > . > > > >> > > > >> > > > > Hi Becket, > > > > > > > > I don't think progress can be made in the processor's run loop if the > > > > queue fills up. i.e., I think Jun's point is that if the queue is > full > > > > (either due to the proposed max.bytes or today due to max.requests > > > hitting > > > > the limit) then processCompletedReceives will block and no further > > > progress > > > > can be made. > > > > > > > > > > I'm sorry - this isn't right. There will be progress as long as the API > > > handlers are able to pick requests off the request queue and add the > > > responses to the response queues (which are effectively unbounded). > > > However, the point is valid that blocking in the request channel's put > > has > > > the effect of exacerbating the pressure on the socket server. > > > > > > > > > > > > > >> > > > >> On Mon, Aug 8, 2016 at 10:04 AM, Jun Rao <j...@confluent.io> wrote: > > > >> > > > >> > Radai, > > > >> > > > > >> > Thanks for the proposal. A couple of comments on this. > > > >> > > > > >> > 1. Since we store request objects in the request queue, how do we > > get > > > an > > > >> > accurate size estimate for those requests? > > > >> > > > > >> > 2. Currently, it's bad if the processor blocks on adding a request > > to > > > >> the > > > >> > request queue. Once blocked, the processor can't process the > sending > > > of > > > >> > responses of other socket keys either. This will cause all clients > > in > > > >> this > > > >> > processor with an outstanding request to eventually timeout. > > > Typically, > > > >> > this will trigger client-side retries, which will add more load on > > the > > > >> > broker and cause potentially more congestion in the request queue. > > > With > > > >> > queued.max.requests, to prevent blocking on the request queue, our > > > >> > recommendation is to configure queued.max.requests to be the same > as > > > the > > > >> > number of socket connections on the broker. Since the broker never > > > >> > processes more than 1 request per connection at a time, the > request > > > >> queue > > > >> > will never be blocked. With queued.max.bytes, it's going to be > > harder > > > to > > > >> > configure the value properly to prevent blocking. > > > >> > > > > >> > So, while adding queued.max.bytes is potentially useful for memory > > > >> > management, for it to be truly useful, we probably need to address > > the > > > >> > processor blocking issue for it to be really useful in practice. > One > > > >> > possibility is to put back-pressure to the client when the request > > > >> queue is > > > >> > blocked. For example, if the processor notices that the request > > queue > > > is > > > >> > full, it can turn off the interest bit for read for all socket > keys. > > > >> This > > > >> > will allow the processor to continue handling responses. When the > > > >> request > > > >> > queue has space again, it can indicate the new state to the > process > > > and > > > >> > wake up the selector. Not sure how this will work with multiple > > > >> processors > > > >> > though since the request queue is shared across all processors. > > > >> > > > > >> > Thanks, > > > >> > > > > >> > Jun > > > >> > > > > >> > > > > >> > > > > >> > On Thu, Aug 4, 2016 at 11:28 AM, radai < > radai.rosenbl...@gmail.com> > > > >> wrote: > > > >> > > > > >> > > Hello, > > > >> > > > > > >> > > I'd like to initiate a discussion about > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > >> > > 72%3A+Allow+Sizing+Incoming+Request+Queue+in+Bytes > > > >> > > > > > >> > > The goal of the KIP is to allow configuring a bound on the > > capacity > > > >> (as > > > >> > in > > > >> > > bytes of memory used) of the incoming request queue, in addition > > to > > > >> the > > > >> > > current bound on the number of messages. > > > >> > > > > > >> > > This comes after several incidents at Linkedin where a sudden > > > "spike" > > > >> of > > > >> > > large message batches caused an out of memory exception. > > > >> > > > > > >> > > Thank you, > > > >> > > > > > >> > > Radai > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > >