[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355161#comment-14355161 ]
Guozhang Wang commented on KAFKA-1461: -------------------------------------- [~junrao] Could you elaborate a bit on "different partitions become active at slightly different times and the fetcher doesn't actually back off"? Not sure I understand why the fetcher does not actually back off. I agree that upon IOException thrown in SimpleConsumer.fetch, we should back off the thread as a whole for common case #1 you mentioned above; but at the same time we should still consider backing off for partition-specific error codes, as otherwise the broker logs will be kind of polluted with all error messages from continuous retries we have seen before. Do you agree? > Replica fetcher thread does not implement any back-off behavior > --------------------------------------------------------------- > > Key: KAFKA-1461 > URL: https://issues.apache.org/jira/browse/KAFKA-1461 > Project: Kafka > Issue Type: Improvement > Components: replication > Affects Versions: 0.8.1.1 > Reporter: Sam Meder > Assignee: Sriharsha Chintalapani > Labels: newbie++ > Fix For: 0.8.3 > > Attachments: KAFKA-1461.patch > > > The current replica fetcher thread will retry in a tight loop if any error > occurs during the fetch call. For example, we've seen cases where the fetch > continuously throws a connection refused exception leading to several replica > fetcher threads that spin in a pretty tight loop. > To a much lesser degree this is also an issue in the consumer fetcher thread, > although the fact that erroring partitions are removed so a leader can be > re-discovered helps some. -- This message was sent by Atlassian JIRA (v6.3.4#6332)