Jianbin Chen created KAFKA-20104:
------------------------------------

             Summary: Inquiry about migrating from ZooKeeper to KRaft
                 Key: KAFKA-20104
                 URL: https://issues.apache.org/jira/browse/KAFKA-20104
             Project: Kafka
          Issue Type: Wish
          Components: core
    Affects Versions: 3.9.1
         Environment: rocky9.4 kafka 3.9.1
            Reporter: Jianbin Chen


Hi everyone,

I’m trying to migrate a test cluster from ZooKeeper to KRaft in-place (i.e., 
not provisioning three new controller-only nodes first and then pointing the 
existing brokers to them). I hit a problem and would appreciate any pointers.

What I did
- Enabled zookeeper.metadata.migration.enable on each existing broker and set 
the controller quorum settings so each broker acts as a controller+broker 
(process.roles=broker,controller).
- Rolled the three nodes.

Relevant broker configuration (each broker has similar config; example shown):

```
process.roles=broker,controller
[node.id|http://node.id/]=7
[broker.id|http://broker.id/]=7
zookeeper.metadata.migration.enable=true
controller.quorum.voters=7@broker1:9093,6@broker2:9093,4@broker3:9093
controller.quorum.bootstrap.servers=broker1:9093,broker2:9093,broker3:9093
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
controller.listener.names=CONTROLLER
[group.initial.rebalance.delay.ms|http://group.initial.rebalance.delay.ms/]=0
listeners=SSL://ip:9092,PLAINTEXT://ip:9192,CONTROLLER://ip:9093
listener.security.protocol.map=SSL:SSL,PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
```

Observed behavior
After rolling restart, the controller logs show the quorum is ready for 
migration, but the controller repeatedly logs that no brokers are known to 
KRaft:

```
[2026-01-27 17:47:15,626] INFO [KRaftMigrationDriver id=7] Controller Quorum is 
ready for Zk to KRaft migration. Now waiting for ZK brokers. 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2026-01-27 17:47:15,627] INFO [KRaftMigrationDriver id=7] 7 transitioning from 
WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2026-01-27 17:47:15,627] INFO [KRaftMigrationDriver id=7] No brokers are known 
to KRaft, waiting for brokers to register. 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2026-01-27 17:47:15,726] INFO [KRaftMigrationDriver id=7] No brokers are known 
to KRaft, waiting for brokers to register. 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[2026-01-27 17:47:15,925] INFO [KRaftMigrationDriver id=7] No brokers are known 
to KRaft, waiting for brokers to register.
```
It appears that the controller quorum becomes ready for migration, but the 
migration driver repeatedly logs "No brokers are known to KRaft, waiting for 
brokers to register." and does not make progress.

My current understanding is as follows:
- Migration from ZooKeeper to KRaft normally involves standing up a separate 
controller-only KRaft cluster first, then updating the existing brokers to 
point to that controller cluster (via controller.quorum.bootstrap.servers) and 
enabling zookeeper.metadata.migration.enable.
- Performing an in-place migration (having the existing brokers also act as 
controllers) seems risky because controller quorum elections require a majority 
of controller nodes. For example, with topics having replication.factor=2, you 
may need to restart two brokers to form the new controller quorum, which would 
make RF=2 topics unavailable during the migration.
- Therefore I am unsure whether my understanding is correct (i.e., in-place 
migration is unsafe or unsupported for production-like setups) or whether Kafka 
actually supports in-place migration and I have a configuration error.

I would greatly appreciate it if you could confirm which is the case. If 
in-place migration is supported, could you please advise what configuration or 
sequence I am missing so that brokers register with KRaft correctly? If 
in-place migration is not recommended, could you recommend the safest procedure 
to migrate a test cluster while minimizing downtime for producers and consumers?

Thank you very much for your time and assistance.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to