[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355301#comment-14355301 ]
Jun Rao commented on KAFKA-1461: -------------------------------- [~guozhang], my concern is on the implementation of the DelayedItem. If you create a bunch of DelayedItems with the same timeout, they may timeout slightly differently since the calculation depends on the current time, which can change. In the second case when the leaders are moved one at time, what's going to happen is that the controller will tell the broker to move to the right leader right away. This typically happens within a few milli seconds. We could optimize this case, but I am not sure if it's worth the extra complexity in the code. In the first case, the remaining shutdown process could take seconds after the socket server is shut down. So backing off will definitely help. Perhaps we can just do a simple experiment with controlled shutdown and see how serious the issue is w/o backing off. > Replica fetcher thread does not implement any back-off behavior > --------------------------------------------------------------- > > Key: KAFKA-1461 > URL: https://issues.apache.org/jira/browse/KAFKA-1461 > Project: Kafka > Issue Type: Improvement > Components: replication > Affects Versions: 0.8.1.1 > Reporter: Sam Meder > Assignee: Sriharsha Chintalapani > Labels: newbie++ > Fix For: 0.8.3 > > Attachments: KAFKA-1461.patch > > > The current replica fetcher thread will retry in a tight loop if any error > occurs during the fetch call. For example, we've seen cases where the fetch > continuously throws a connection refused exception leading to several replica > fetcher threads that spin in a pretty tight loop. > To a much lesser degree this is also an issue in the consumer fetcher thread, > although the fact that erroring partitions are removed so a leader can be > re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)