[ https://issues.apache.org/jira/browse/KAFKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-1286: --------------------------------- Attachment: KAFKA-1286.patch > Retry Can Block > ---------------- > > Key: KAFKA-1286 > URL: https://issues.apache.org/jira/browse/KAFKA-1286 > Project: Kafka > Issue Type: Sub-task > Components: producer > Reporter: Guozhang Wang > Attachments: KAFKA-1286.patch > > > Under the following scenario the retry logic can block > 1. The last broker's socket closed, sender.handleDisconnect() triggered, put > the node as disconnected. > 2. In the next sender.run(), since the node is disconnected, remove the > partition from ready set, and call sender.initConnection(), which will not > throw exception. > 3. So in this round of send, the only request it tries to send to is the > metadata request, to the last broker; and the sender will firstly try to > connect to that broker. > 4. In selector.poll(), the finishConnect() call will throw exception, and in > handleDisconnects(), inFlight request's batches will be null since it is a > metadata request. > 5. Now we will go back to 1, and loop forever. Note that this infinite loop > can be triggered even without calling producer.close. > Also, we need to introduce the retry backoff config, otherwise the retries > will be exhausted too soon (in my tests 10 retries can be exhausted in about > 600ms). -- This message was sent by Atlassian JIRA (v6.2#6252)