Graham Campbell created KAFKA-16951: ---------------------------------------
Summary: TransactionManager should rediscover coordinator on disconnection Key: KAFKA-16951 URL: https://issues.apache.org/jira/browse/KAFKA-16951 Project: Kafka Issue Type: Improvement Components: clients, producer Affects Versions: 3.7.0 Reporter: Graham Campbell When a transaction coordinator for a transactional client shuts down for restart or due to failure, the NetworkClient notices the broker disconnection and [will automatically refresh cluster metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183] to get the latest partition assignments. The TransactionManager does not notice any changes until the next transactional request. If the broker is still offline, this is a [blocking wait while the client attempts to reconnect to the old coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490], which can be up to request.timeout.ms long (default 35 seconds). Coordinator lookup is only performed after a transactional request times out and fails. The lookup is triggered in either the [Sender|#L525-L528] or [TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229] error handling. To support faster recovery and faster reaction to transaction coordinator reassignments, the TransactionManager should proactively lookup the transaction coordinator whenever the client is disconnected from the current transaction coordinator. -- This message was sent by Atlassian Jira (v8.20.10#820010)