Cheng Tan created KAFKA-9893:
--------------------------------
Summary: Configurable TCP connection timeout for AdminClient
Key: KAFKA-9893
URL: https://issues.apache.org/jira/browse/KAFKA-9893
Project: Kafka
Issue Type: New Feature
Reporter: Cheng Tan
We do not currently allow for connection timeouts to be defined within
AdminClient, and as a result rely on the default OS settings to determine
whether a broker is inactive before selecting an alternate broker from
bootstrap.
In the case of a connection timeout on initial handshake, and where
tcp_syn_retries is the default (6), we won't timeout an unresponsive broker
until ~127s - while the client will timeout sooner (~120s).
Reducing tcp_syn_retries should mitigate the issue depending on the number of
unresponsive brokers within the bootstrap, though this will be applied system
wide, and it would be good if we could instead configure connection timeouts
for AdminClient.
The use case where this came up was a customer performing DC failover tests
with a stretch cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)