[jira] [Resolved] (KAFKA-17966) Controller replacement does not support scaling up before scaling down

Federico Valeri (Jira) Tue, 01 Apr 2025 09:20:07 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-17966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Federico Valeri resolved KAFKA-17966.
-------------------------------------
    Resolution: Feedback Received

> Controller replacement does not support scaling up before scaling down
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-17966
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17966
>             Project: Kafka
>          Issue Type: New Feature
>          Components: kraft
>    Affects Versions: 3.9.0
>            Reporter: Federico Valeri
>            Priority: Major
>
> In KRaft, complex quorum changes are implemented as a series of 
> single-controller changes. In this case, it is preferable to add controllers 
> before removing controllers. For example, to replace a controller in a 
> three-controller cluster, adding one controller and then removing the other 
> allows the system to handle one controller failure at all times throughout 
> the whole process. This is currently not possible, as it leads to 
> DuplicateVoterException, so you are forced to do a scale down, followed by a 
> scale up.
> Example:
> The operator can replace a failed disk with a new one. The replaced disk 
> needs to be formatted with a new directory ID.
> {code}
> $ CLUSTER_ID="$(bin/kafka-cluster.sh cluster-id --bootstrap-server 
> localhost:9092 | awk -F': ' '{print $2}')"
> $ bin/kafka-storage.sh format \
>   --config /opt/kafka/server2/config/server.properties \
>   --cluster-id "$CLUSTER_ID" \
>   --no-initial-controllers \
>   --ignore-formatted
> Formatting metadata directory /opt/kafka/server2/metadata with 
> metadata.version 3.9-IV0.
> {code}
> After restarting the controller, the quorum will have two nodes with ID two: 
> the original incarnation with a failed disk and an ever growing lag and 
> follower status, plus a new one with a different directory ID and observer 
> status.
> {code}
> $ bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:8000 describe 
> --re --hu
> NodeId        DirectoryId             LogEndOffset    Lag     
> LastFetchTimestamp      LastCaughtUpTimestamp   Status   
> 0             pbvuBlaTTwKRxS5NLJwRFQ  535             0       6 ms ago        
>         6 ms ago                Leader   
> 1             QjRpFtVDTtCa8OLXiSbmmA  535             0       283 ms ago      
>         283 ms ago              Follower    
> 2             slcsM5ZAR0SMIF_u__MAeg  407             128     63307 ms ago    
>         63802 ms ago            Follower    
> 2             wrqMDI1WDsqaooVSOtlgYw  535             0       281 ms ago      
>         281 ms ago              Observer    
> 8             aXLz3ixjqzXhCYqKHRD4WQ  535             0       284 ms ago      
>         284 ms ago              Observer    
> 7             KCriHQZm3TlxvEVNgyWKJw  535             0       284 ms ago      
>         284 ms ago              Observer    
> 9             v5nnIwK8r0XqjyqlIPW-aw  535             0       284 ms ago      
>         284 ms ago              Observer
> {code}
> Once the new controller is in sync with the leader, we try to do a scale up.
> {code}
> $ bin/kafka-metadata-quorum.sh \
>   --bootstrap-controller localhost:8000 \
>   --command-config /opt/kafka/server2/config/server.properties \
>   add-controller
> org.apache.kafka.common.errors.DuplicateVoterException: The voter id for 
> ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already 
> part of the set of voters [ReplicaKey(id=0, 
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, 
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, 
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.DuplicateVoterException: The voter id for 
> ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already 
> part of the set of voters [ReplicaKey(id=0, 
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, 
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, 
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
>       at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>       at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>       at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
>       at 
> org.apache.kafka.tools.MetadataQuorumCommand.handleAddController(MetadataQuorumCommand.java:431)
>       at 
> org.apache.kafka.tools.MetadataQuorumCommand.execute(MetadataQuorumCommand.java:147)
>       at 
> org.apache.kafka.tools.MetadataQuorumCommand.mainNoExit(MetadataQuorumCommand.java:81)
>       at 
> org.apache.kafka.tools.MetadataQuorumCommand.main(MetadataQuorumCommand.java:76)
> Caused by: org.apache.kafka.common.errors.DuplicateVoterException: The voter 
> id for ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is 
> already part of the set of voters [ReplicaKey(id=0, 
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1, 
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2, 
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-17966) Controller replacement does not support scaling up before scaling down

Reply via email to