[ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771981#comment-13771981
 ] 

Guozhang Wang commented on KAFKA-1043:
--------------------------------------

IMHO the local time processing the fetch response is linear to # partitions in 
the request, while the network time writing the socket buffer is not, depending 
on whether the data is still in file cache or not. Hence following the 1) 
reset-socket-buffer-size or 2) subset-topic-partitions-at-a-time methods if we 
need either 1) set the buffer size too small which is unfair for other requests 
that do not hit I/O and may result in unnecessary round trips or 2) fetch too 
small a subset of topic-partitions which will be the same case as 1).

Capping based on time is better since it provides "fairness" but that seems a 
little hacky.

My reasoning of decoupling socket and network processor is the following. As we 
scale up the principle should be "various clients are isolated from each 
other". As for fetch request it would be "if you request old data from many 
topic partitions only your self-request should take long time but other 
requests should not be impacted". Today a request's life time as on server is

socket -> network processor -> request handler -> (possible) disk I/O due to 
flush for produce request -> socket processor -> network I/O

and one way to enable isolation is that no pair of this path is 
single-threaded. Today socket -> network processor is via acceptor, network 
processor -> request handler is via request queue, request handler -> 
(possible) disk I/O due to flush for produce request is fixed in KAFKA-615; but 
socket processor -> network I/O is still coupled, and fixes to issues resulted 
by this coupling would be taking care of the "worst case", which does not obey 
the "isolation" principle. 

I agree this is rather complex and would be a long term thing.
                
> Time-consuming FetchRequest could block other request in the response queue
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-1043
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1043
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.8, 0.8.1
>
>
> Since in SocketServer the processor who takes any request is also responsible 
> for writing the response for that request, we make each processor owning its 
> own response queue. If a FetchRequest takes irregularly long time to write 
> the channel buffer it would block all other responses in the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to