David Hoffman created KAFKA-13388:
-------------------------------------

             Summary: Kafka Producer has no timeout for nodes stuck in 
CHECKING_API_VERSIONS
                 Key: KAFKA-13388
                 URL: https://issues.apache.org/jira/browse/KAFKA-13388
             Project: Kafka
          Issue Type: Bug
          Components: core
            Reporter: David Hoffman


I have been seeing expired batch errors in my app.
{code:java}
org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for 
xxx-17:120002 ms has passed since batch creation
{code}
 I would have assumed a request timout or connection timeout should have also 
been logged. I could not find any other associated errors. 

I added some instrumenting to my app and have traced this down to broker 
connections hanging in CHECKING_API_VERSIONS state. It appears there is no 
effective timeout for Kafka Producer broker connections in 
CHECKING_API_VERSIONS state.

In the code see the after the NetworkClient connects to a broker node it makes 
a request to check api versions, when it receives the response it marks the 
node as ready. I am seeing that sometimes a reply is not received for the check 
api versions request the connection just hangs in CHECKING_API_VERSIONS state 
until it is disposed I assume after the idle connection timeout.

I am guessing the connection setup timeout should be still in play for this, 
but it is not. 
There is a connectingNodes set that is consulted when checking timeouts and the 
node is removed 
when ClusterConnectionStates.checkingApiVersions(String id) is called to 
transition the node into CHECKING_API_VERSIONS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to