Daniel Fonai created KAFKA-18875:
------------------------------------
Summary: KRaft controller does not retry registration if the first
attempt times out
Key: KAFKA-18875
URL: https://issues.apache.org/jira/browse/KAFKA-18875
Project: Kafka
Issue Type: Bug
Reporter: Daniel Fonai
There is a [retry
mechanism|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerRegistrationManager.scala#L274]
with exponential backoff built-in in KRaft controller registration. The
timeout of the first attempt is 5 s for KRaft controllers
([code|https://github.com/apache/kafka/blob/3.9.0/core/src/main/scala/kafka/server/ControllerServer.scala#L448])
which is not configurable.
If for some reason the controller's first registration request times out, the
attempt should be retried but in practice this does not happen and the
controller is not able to join the quorum. We see the following in the faulty
controller's log:
{noformat}
2025-02-21 13:31:46,606 INFO [ControllerRegistrationManager id=3
incarnation=mEzjHheAQ_eDWejAFquGiw] sendControllerRegistration: attempting to
send ControllerRegistrationRequestData(controllerId=3,
incarnationId=mEzjHheAQ_eDWejAFquGiw, zkMigrationReady=true,
listeners=[Listener(name='CONTROLPLANE-9090',
host='kraft-rollback-kafka-controller-pool-3.kraft-rollback-kafka-kafka-brokers.csm-op-test-kraft-rollback-631e64ac.svc',
port=9090, securityProtocol=1)], features=[Feature(name='kraft.version',
minSupportedVersion=0, maxSupportedVersion=1), Feature(name='metadata.version',
minSupportedVersion=1, maxSupportedVersion=21)])
(kafka.server.ControllerRegistrationManager)
[controller-3-registration-manager-event-handler]
...
2025-02-21 13:31:51,627 ERROR [ControllerRegistrationManager id=3
incarnation=mEzjHheAQ_eDWejAFquGiw] RegistrationResponseHandler: channel
manager timed out before sending the request.
(kafka.server.ControllerRegistrationManager)
[controller-3-to-controller-registration-channel-manager]
2025-02-21 13:31:51,726 INFO [ControllerRegistrationManager id=3
incarnation=mEzjHheAQ_eDWejAFquGiw] maybeSendControllerRegistration: waiting
for the previous RPC to complete. (kafka.server.ControllerRegistrationManager)
[controller-3-registration-manager-event-handler]
{noformat}
After this we can not see any controller retry in the log.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)