Daniel Fonai created KAFKA-18875:
------------------------------------

             Summary: KRaft controller does not retry registration if the first 
attempt times out
                 Key: KAFKA-18875
                 URL: https://issues.apache.org/jira/browse/KAFKA-18875
             Project: Kafka
          Issue Type: Bug
            Reporter: Daniel Fonai


There is a [retry 
mechanism|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerRegistrationManager.scala#L274]
 with exponential backoff built-in in KRaft controller registration. The 
timeout of the first attempt is 5 s for KRaft controllers 
([code|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerServer.scala#L448])
 which is not configurable.

If for some reason the controller's first registration request times out, the 
attempt should be retried but in practice this does not happen and the 
controller is not able to join the quorum. We see the following in the faulty 
controller's log:
{noformat}
2025-02-21 13:31:46,606 INFO [ControllerRegistrationManager id=3 
incarnation=mEzjHheAQ_eDWejAFquGiw] sendControllerRegistration: attempting to 
send ControllerRegistrationRequestData(controllerId=3, 
incarnationId=mEzjHheAQ_eDWejAFquGiw, zkMigrationReady=true, 
listeners=[Listener(name='CONTROLPLANE-9090', 
host='kraft-rollback-kafka-controller-pool-3.kraft-rollback-kafka-kafka-brokers.csm-op-test-kraft-rollback-631e64ac.svc',
 port=9090, securityProtocol=1)], features=[Feature(name='kraft.version', 
minSupportedVersion=0, maxSupportedVersion=1), Feature(name='metadata.version', 
minSupportedVersion=1, maxSupportedVersion=21)]) 
(kafka.server.ControllerRegistrationManager) 
[controller-3-registration-manager-event-handler]
...
2025-02-21 13:31:51,627 ERROR [ControllerRegistrationManager id=3 
incarnation=mEzjHheAQ_eDWejAFquGiw] RegistrationResponseHandler: channel 
manager timed out before sending the request. 
(kafka.server.ControllerRegistrationManager) 
[controller-3-to-controller-registration-channel-manager]
2025-02-21 13:31:51,726 INFO [ControllerRegistrationManager id=3 
incarnation=mEzjHheAQ_eDWejAFquGiw] maybeSendControllerRegistration: waiting 
for the previous RPC to complete. (kafka.server.ControllerRegistrationManager) 
[controller-3-registration-manager-event-handler]
{noformat}
After this we can not see any controller retry in the log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to