José Armando García Sancio created KAFKA-20661:
--------------------------------------------------

             Summary: KRaft unavailability when controller are misconfigured
                 Key: KAFKA-20661
                 URL: https://issues.apache.org/jira/browse/KAFKA-20661
             Project: Kafka
          Issue Type: Bug
          Components: kraft
    Affects Versions: 3.9.0
            Reporter: José Armando García Sancio
            Assignee: José Armando García Sancio
             Fix For: 4.4.0


It is possible for a controller quorum to become unavailable and get into an 
unrecoverable state if the controller's advertised listener is misconfigured.

If the user misconfigures the controller's advertised listener to an address 
that is not routable from the other controller, the controller nodes won't be 
able to reach each other. This would cause the controller cluster to become 
unavailable.

Since the controllers' unreachable endpoints were persisted in the cluster 
metadata partition by the active controller (kraft leader), if the active 
controller loses leadership a new leader won't be established since controllers 
need to reach each other through the VOTE request to establish leadership.

The solution to this problem is for the leader to test the default listener 
before accepting an UPDATE_VOTER request from the inactive controller. This 
guarantees that at least the current leader is able to reach all of the other 
controllers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to