Hello!
I sent this message a couple of months ago, but I did not get a response. I hope it is ok to try again. I'm wondering if there's a right way to add new controllers to an already existing cluster without downtime. I've tried the following: I have three controllers joined in a cluster then one by one I change configuration to 4 voters, stop controller, delete quorum-state, start controller. When it is done for all 3 existing controllers I start the fourth one. In most of my experiments there is a consensus on who is the leader after the 4th controller is up, but there were some cases where the leader couldn't be elected and I had to restart each controller again. Although the leader was elected after the transition, I'm not sure if during the transition the cluster has not temporarily lost an ability to elect a leader. I think that this problem should potentially go away if there is atleast 5 controllers to begin with. Then I do the same thing but change the configurations to 5 voters. When I run the fifth one it can't seem to figure out who the leader is and it is calling up new elections indefinitely (as shown in the log). If I then restart all of the controllers one by one, the cluster settles on a leader. Is this the right way to go about it? I'm concerned because of the indefinite elections, also I'd like the process to be as simple as possible. Thanks and best regards, Martin. [2022-03-25 09:16:44,233] INFO [RaftManager nodeId=5] Completed transition to CandidateState(localId=5, epoch=2109, retries=1, electionTimeoutMs=1090) (org.apache.kafka.raft.QuorumState) [2022-03-25 09:16:44,234] INFO [RaftManager nodeId=5] Vote request VoteRequestData(clusterId='ihf_kq2QSbmIRDQhTLSZYg', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=2109, candidateId=4, lastOffsetEpoch=2004, lastOffset=623)])]) with epoch 2109 is rejected (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:44,292] INFO [RaftManager nodeId=5] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,200] INFO [RaftManager nodeId=5] Completed transition to CandidateState(localId=5, epoch=2110, retries=2, electionTimeoutMs=1774) (org.apache.kafka.raft.QuorumState) [2022-03-25 09:16:45,201] INFO [RaftManager nodeId=5] Vote request VoteRequestData(clusterId='ihf_kq2QSbmIRDQhTLSZYg', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=2110, candidateId=4, lastOffsetEpoch=2004, lastOffset=623)])]) with epoch 2110 is rejected (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,305] INFO [RaftManager nodeId=5] Vote request VoteRequestData(clusterId='ihf_kq2QSbmIRDQhTLSZYg', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=2110, candidateId=2, lastOffsetEpoch=2100, lastOffset=627)])]) with epoch 2110 is rejected (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,307] INFO [RaftManager nodeId=5] Vote request VoteRequestData(clusterId='ihf_kq2QSbmIRDQhTLSZYg', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=2110, candidateId=3, lastOffsetEpoch=2039, lastOffset=626)])]) with epoch 2110 is rejected (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,308] INFO [RaftManager nodeId=5] Insufficient remaining votes to become leader (rejected by [2, 3, 4]). We will backoff before retrying election again (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,412] INFO [RaftManager nodeId=5] Vote request VoteRequestData(clusterId='ihf_kq2QSbmIRDQhTLSZYg', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=2110, candidateId=1, lastOffsetEpoch=2079, lastOffset=627)])]) with epoch 2110 is rejected (org.apache.kafka.raft.KafkaRaftClient) [2022-03-25 09:16:45,607] INFO [RaftManager nodeId=5] Re-elect as candidate after election backoff has completed (org.apache.kafka.raft.KafkaRaftClient)