[ https://issues.apache.org/jira/browse/KAFKA-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Lin updated KAFKA-4841: ---------------------------- Description: KAFKA-4820 allows new request to be enqueued to unsent by user thread while some other thread does poll(...). This causes problem in the following scenario: - Thread A calls poll(...) and is blocked on select(...) - Thread B enqueues a request into unsent of ConsumerNetworkClient for node N - Thread A calls checkDisconnects(now) -> client.connectionFailed(N) Because no attempts have been made to connection to node N yet, there is no state for node N and connectionFailed(N) would throw exception. Note that this problem only occurs when one thread is able to enqueue requests while another thread is in the process of `poll(...)` The solution is to only consider a connection has failed if attempts have been made to connect to this node AND the connection state is DISCONNECTED. was: KAFKA-4820 allows new request to be enqueued to unsent by user thread while some other thread does poll(...). This causes problem in the following scenario: Thread A calls poll(...) and is blocked on select(...) Thread B enqueues a request into unsent of ConsumerNetworkClient for node N Thread A calls checkDisconnects(now) -> client.connectionFailed(N) Because no attempts have been made to connection to node N yet, there is no state for node N and connectionFailed(N) would throw exception. The solution is to only consider a connection has failed if attempts have been made to connect to this node AND the connection state is DISCONNECTED. > NetworkClient should only consider a connection to be fail after attempt to > connect > ----------------------------------------------------------------------------------- > > Key: KAFKA-4841 > URL: https://issues.apache.org/jira/browse/KAFKA-4841 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > > KAFKA-4820 allows new request to be enqueued to unsent by user thread while > some other thread does poll(...). This causes problem in the following > scenario: > - Thread A calls poll(...) and is blocked on select(...) > - Thread B enqueues a request into unsent of ConsumerNetworkClient for node N > - Thread A calls checkDisconnects(now) -> client.connectionFailed(N) > Because no attempts have been made to connection to node N yet, there is no > state for node N and connectionFailed(N) would throw exception. Note that > this problem only occurs when one thread is able to enqueue requests while > another thread is in the process of `poll(...)` > The solution is to only consider a connection has failed if attempts have > been made to connect to this node AND the connection state is DISCONNECTED. -- This message was sent by Atlassian JIRA (v6.3.15#6346)