[ https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771981#comment-13771981 ]
Guozhang Wang commented on KAFKA-1043: -------------------------------------- IMHO the local time processing the fetch response is linear to # partitions in the request, while the network time writing the socket buffer is not, depending on whether the data is still in file cache or not. Hence following the 1) reset-socket-buffer-size or 2) subset-topic-partitions-at-a-time methods if we need either 1) set the buffer size too small which is unfair for other requests that do not hit I/O and may result in unnecessary round trips or 2) fetch too small a subset of topic-partitions which will be the same case as 1). Capping based on time is better since it provides "fairness" but that seems a little hacky. My reasoning of decoupling socket and network processor is the following. As we scale up the principle should be "various clients are isolated from each other". As for fetch request it would be "if you request old data from many topic partitions only your self-request should take long time but other requests should not be impacted". Today a request's life time as on server is socket -> network processor -> request handler -> (possible) disk I/O due to flush for produce request -> socket processor -> network I/O and one way to enable isolation is that no pair of this path is single-threaded. Today socket -> network processor is via acceptor, network processor -> request handler is via request queue, request handler -> (possible) disk I/O due to flush for produce request is fixed in KAFKA-615; but socket processor -> network I/O is still coupled, and fixes to issues resulted by this coupling would be taking care of the "worst case", which does not obey the "isolation" principle. I agree this is rather complex and would be a long term thing. > Time-consuming FetchRequest could block other request in the response queue > --------------------------------------------------------------------------- > > Key: KAFKA-1043 > URL: https://issues.apache.org/jira/browse/KAFKA-1043 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.1 > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.8, 0.8.1 > > > Since in SocketServer the processor who takes any request is also responsible > for writing the response for that request, we make each processor owning its > own response queue. If a FetchRequest takes irregularly long time to write > the channel buffer it would block all other responses in the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira