[ 
https://issues.apache.org/jira/browse/KAFKA-18874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930751#comment-17930751
 ] 

Ismael Juma commented on KAFKA-18874:
-------------------------------------

Thanks for the report. Can you specify the version you tested?

> KRaft controller does not retry registration if the first attempt times out
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-18874
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18874
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Daniel Fonai
>            Priority: Minor
>
> There is a [retry 
> mechanism|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerRegistrationManager.scala#L274]
>  with exponential backoff built-in in KRaft controller registration. The 
> timeout of the first attempt is 5 s for KRaft controllers 
> ([code|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerServer.scala#L448])
>  which is not configurable.
> If for some reason the controller's first registration request times out, the 
> attempt should be retried but in practice this does not happen and the 
> controller is not able to join the quorum. We see the following in the faulty 
> controller's log:
> {noformat}
> 2025-02-21 13:31:46,606 INFO [ControllerRegistrationManager id=3 
> incarnation=mEzjHheAQ_eDWejAFquGiw] sendControllerRegistration: attempting to 
> send ControllerRegistrationRequestData(controllerId=3, 
> incarnationId=mEzjHheAQ_eDWejAFquGiw, zkMigrationReady=true, 
> listeners=[Listener(name='CONTROLPLANE-9090', 
> host='kraft-rollback-kafka-controller-pool-3.kraft-rollback-kafka-kafka-brokers.csm-op-test-kraft-rollback-631e64ac.svc',
>  port=9090, securityProtocol=1)], features=[Feature(name='kraft.version', 
> minSupportedVersion=0, maxSupportedVersion=1), 
> Feature(name='metadata.version', minSupportedVersion=1, 
> maxSupportedVersion=21)]) (kafka.server.ControllerRegistrationManager) 
> [controller-3-registration-manager-event-handler]
> ...
> 2025-02-21 13:31:51,627 ERROR [ControllerRegistrationManager id=3 
> incarnation=mEzjHheAQ_eDWejAFquGiw] RegistrationResponseHandler: channel 
> manager timed out before sending the request. 
> (kafka.server.ControllerRegistrationManager) 
> [controller-3-to-controller-registration-channel-manager]
> 2025-02-21 13:31:51,726 INFO [ControllerRegistrationManager id=3 
> incarnation=mEzjHheAQ_eDWejAFquGiw] maybeSendControllerRegistration: waiting 
> for the previous RPC to complete. 
> (kafka.server.ControllerRegistrationManager) 
> [controller-3-registration-manager-event-handler]
> {noformat}
> After this we can not see any controller retry in the log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to