[
https://issues.apache.org/jira/browse/KAFKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-1286:
---------------------------------
Attachment: KAFKA-1286_2014-03-04_17:56:47.patch
> Retry Can Block
> ----------------
>
> Key: KAFKA-1286
> URL: https://issues.apache.org/jira/browse/KAFKA-1286
> Project: Kafka
> Issue Type: Sub-task
> Components: producer
> Reporter: Guozhang Wang
> Attachments: KAFKA-1286.patch, KAFKA-1286_2014-03-04_11:04:32.patch,
> KAFKA-1286_2014-03-04_15:14:49.patch, KAFKA-1286_2014-03-04_17:56:47.patch
>
>
> Under the following scenario the retry logic can block
> 1. The last broker's socket closed, sender.handleDisconnect() triggered, put
> the node as disconnected.
> 2. In the next sender.run(), since the node is disconnected, remove the
> partition from ready set, and call sender.initConnection(), which will not
> throw exception.
> 3. So in this round of send, the only request it tries to send to is the
> metadata request, to the last broker; and the sender will firstly try to
> connect to that broker.
> 4. In selector.poll(), the finishConnect() call will throw exception, and in
> handleDisconnects(), inFlight request's batches will be null since it is a
> metadata request.
> 5. Now we will go back to 1, and loop forever. Note that this infinite loop
> can be triggered even without calling producer.close.
> Also, we need to introduce the retry backoff config, otherwise the retries
> will be exhausted too soon (in my tests 10 retries can be exhausted in about
> 600ms).
--
This message was sent by Atlassian JIRA
(v6.2#6252)