Guozhang Wang created KAFKA-1286:
------------------------------------

             Summary: Retry Can Block 
                 Key: KAFKA-1286
                 URL: https://issues.apache.org/jira/browse/KAFKA-1286
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Guozhang Wang


Under the following scenario the retry logic can block

1. The last broker's socket closed, sender.handleDisconnect() triggered, put 
the node as disconnected.

2. In the next sender.run(), since the node is disconnected, remove the 
partition from ready set, and call sender.initConnection(), which will not 
throw exception.

3. So in this round of send, the only request it tries to send to is the 
metadata request, to the last broker; and the sender will firstly try to 
connect to that broker.

4. In selector.poll(), the finishConnect() call will throw exception, and in 
handleDisconnects(), inFlight request's batches will be null since it is a 
metadata request.

5. Now we will go back to 1, and loop forever. Note that this infinite loop can 
be triggered even without calling producer.close.

Also, we need to introduce the retry backoff config, otherwise the retries will 
be exhausted too soon (in my tests 10 retries can be exhausted in about 600ms).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to