[ https://issues.apache.org/jira/browse/KAFKA-18874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
José Armando García Sancio updated KAFKA-18874: ----------------------------------------------- Component/s: controller > KRaft controller does not retry registration if the first attempt times out > --------------------------------------------------------------------------- > > Key: KAFKA-18874 > URL: https://issues.apache.org/jira/browse/KAFKA-18874 > Project: Kafka > Issue Type: Bug > Components: controller > Reporter: Daniel Fonai > Priority: Minor > > There is a [retry > mechanism|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerRegistrationManager.scala#L274] > with exponential backoff built-in in KRaft controller registration. The > timeout of the first attempt is 5 s for KRaft controllers > ([code|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerServer.scala#L448]) > which is not configurable. > If for some reason the controller's first registration request times out, the > attempt should be retried but in practice this does not happen and the > controller is not able to join the quorum. We see the following in the faulty > controller's log: > {noformat} > 2025-02-21 13:31:46,606 INFO [ControllerRegistrationManager id=3 > incarnation=mEzjHheAQ_eDWejAFquGiw] sendControllerRegistration: attempting to > send ControllerRegistrationRequestData(controllerId=3, > incarnationId=mEzjHheAQ_eDWejAFquGiw, zkMigrationReady=true, > listeners=[Listener(name='CONTROLPLANE-9090', > host='kraft-rollback-kafka-controller-pool-3.kraft-rollback-kafka-kafka-brokers.csm-op-test-kraft-rollback-631e64ac.svc', > port=9090, securityProtocol=1)], features=[Feature(name='kraft.version', > minSupportedVersion=0, maxSupportedVersion=1), > Feature(name='metadata.version', minSupportedVersion=1, > maxSupportedVersion=21)]) (kafka.server.ControllerRegistrationManager) > [controller-3-registration-manager-event-handler] > ... > 2025-02-21 13:31:51,627 ERROR [ControllerRegistrationManager id=3 > incarnation=mEzjHheAQ_eDWejAFquGiw] RegistrationResponseHandler: channel > manager timed out before sending the request. > (kafka.server.ControllerRegistrationManager) > [controller-3-to-controller-registration-channel-manager] > 2025-02-21 13:31:51,726 INFO [ControllerRegistrationManager id=3 > incarnation=mEzjHheAQ_eDWejAFquGiw] maybeSendControllerRegistration: waiting > for the previous RPC to complete. > (kafka.server.ControllerRegistrationManager) > [controller-3-registration-manager-event-handler] > {noformat} > After this we can not see any controller retry in the log. -- This message was sent by Atlassian Jira (v8.20.10#820010)