[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290398#comment-14290398
 ] 

Guozhang Wang commented on KAFKA-1461:
--------------------------------------

[~sriharsha] Sorry for the late reply.

This fix looks good to me overall, except that we cannot potentially add 
partitions back only in the handlePartitionsWithErrors() call, since it will 
only be triggered when the next error happens. We can probably move this piece 
of code to processPartitionData().

Another way to do this could be: 

1. Make the partitionMap in AbstractFetcherThread of a map from 
TopicAndPartition to OffsetAndState, where OffsetAndState contains the Offset 
(Long) and the State (active, inactive-with-delay). For simplicity we can just 
use Int here, and "active" would be 0, inactive would be the delay time.

2. Adding another function called "delayPartitions" in AbstractFetcherThread, 
which set State to inactive with the delay time.

3. In AbstractFetcherThread doWork() only include partitions with State 0 to 
send the fetch request, and also update the state values for non-zero 
partitions.

> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
>                 Key: KAFKA-1461
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1461
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.1.1
>            Reporter: Sam Meder
>            Assignee: Sriharsha Chintalapani
>              Labels: newbie++
>             Fix For: 0.8.3
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to