[ 
https://issues.apache.org/jira/browse/KAFKA-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774491#comment-17774491
 ] 

Roland Sommer commented on KAFKA-15330:
---------------------------------------

I tried the migration in our staging environment today, using the final 3.6.0 
release of kafka - the brokers report zkmigrationstate=4, the controllers 
report zkmigrationstate=2. I'm still seeing the same errors:
{code:java}
 2022023-10-12 14:51:17[2023-10-12 12:51:17,810] WARN [KRaftMigrationDriver 
id=67] Still waiting for all controller nodes ready to begin the migration. Not 
ready due to: Missing apiVersion from nodes: [87] 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)3-10-12 
14:51:17[2023-10-12 12:51:17,810] WARN [KRaftMigrationDriver id=67] Still 
waiting for all controller nodes ready to begin the migration. Not ready due 
to: Missing apiVersion from nodes: [87] 
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)

2023-10-12 14:50:36[2023-10-12 12:50:36,385] INFO [BrokerLifecycleManager id=2 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager)2023-10-12 
14:50:40[2023-10-12 12:50:40,384] INFO [BrokerLifecycleManager id=4 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager)2023-10-12 
14:50:43[2023-10-12 12:50:43,662] INFO [BrokerLifecycleManager id=3 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager)2023-10-12 
14:50:58[2023-10-12 12:50:58,483] INFO [BrokerLifecycleManager id=6 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager)2023-10-12 
14:51:00[2023-10-12 12:51:00,332] INFO [BrokerLifecycleManager id=5 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager)2023-10-12 
14:51:13[2023-10-12 12:51:13,853] INFO [BrokerLifecycleManager id=1 
isZkBroker=true] Unable to register the broker because the RPC got timed out 
before it could be sent. (kafka.server.BrokerLifecycleManager){code}

> Migration from ZK to KRaft works with 3.4 but fails from 3.5 upwards
> --------------------------------------------------------------------
>
>                 Key: KAFKA-15330
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15330
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.5.0, 3.5.1
>         Environment: Debian Bookworm/12.1
> kafka 3.4 and 3.5 / scala 2.13
> OpenJDK Runtime Environment (build 17.0.8+7-Debian-1deb12u1)
>            Reporter: Roland Sommer
>            Priority: Major
>
> We recently did some migration testing from our old ZK-based kafka clusters 
> to KRaft while still being on kafka 3.4. The migration tests succeeded at 
> first try. In the meantime we updated to kafka 3.5/3.5.1 and now we wanted to 
> continue our migration work, which ran into unexpected problems.
> On the controller we get messages like:
> {code:java}
> Aug 10 06:49:33 kafkactl01 kafka-server-start.sh[48572]: [2023-08-10 
> 06:49:33,072] INFO [KRaftMigrationDriver id=495] Still waiting for all 
> controller nodes ready to begin the migration. due to: Missing apiVersion 
> from nodes: [514, 760] 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver){code}
> On the broker side, we see:
> {code:java}
> 06:52:56,109] INFO [BrokerLifecycleManager id=6 isZkBroker=true] Unable to 
> register the broker because the RPC got timed out before it could be sent. 
> (kafka.server.BrokerLifecycleManager){code}
> If we reinstall the same development cluster with kafka 3.4, using the exact 
> same steps provided by your migration documentation (only difference is using 
> {{inter.broker.protocol.version=3.4}} instead of 
> {{{}inter.broker.protocol.version=3.5{}}}), everything works as expected. 
> Updating to kafka 3.5/3.5.1 yields the same problems.
> Testing is done on a three-node kafka cluster with a three-node zookeeper 
> ensemble and a three-node controller setup.
> Besides our default configuration containing the active zookeeper hosts etc., 
> this is what was added on the brokers:
> {code:java}
> # Migration
> advertised.listeners=PLAINTEXT://kafka03:9092
> listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
> zookeeper.metadata.migration.enable=true
> controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093
> controller.listener.names=CONTROLLER
> {code}
> The main controller config looks like this:
> {code:java}
> process.roles=controller
> node.id=495
> controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093
> listeners=CONTROLLER://:9093
> inter.broker.listener.name=PLAINTEXT
> controller.listener.names=CONTROLLER
> listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
> zookeeper.metadata.migration.enable=true
> {code}
> Both configs contain the identical {{zookeeper.connect}} settings, everything 
> is setup automatically so it should be identical on every run and we can 
> reliably reproduce migration success on kafka 3.4 and migration failure using 
> the same setup with kafka 3.5.
> There are other issues mentioning problems with ApiVersions like KAFKA-15230 
> - not quite sure if this is a duplicate of the underlying problem there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to