[ 
https://issues.apache.org/jira/browse/KAFKA-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776216#comment-17776216
 ] 

David Arthur commented on KAFKA-15330:
--------------------------------------

Can you attach non-sensitive parts of your broker and controller configs to 
this ticket?

One possibility is a mis-match of MetadataVersion (formerly IBP) between the 
controller and the brokers. When registering with the KRaft controller, the ZK 
brokers need to be on the same MetadataVersion that the controller was 
bootstrapped with. 

For example, when you bootstrap the controller you get the latest 
MetadataVersion unless you specify otherwise.

{code}
./bin/kafka-storage.sh format --cluster-id sJLl5Bp7QOSZMs37jir-cA --config 
config/kraft/controller.properties
Formatting /tmp/kraft-controller-logs with metadata.version 3.7-IV0.
{code}

Or with the "--release-version" flag

{code}
./bin/kafka-storage.sh format --cluster-id sJLl5Bp7QOSZMs37jir-cA --config 
config/kraft/controller.properties --release-version 3.6
Formatting /tmp/kraft-controller-logs with metadata.version 3.6-IV2.
{code}

This metadata version must match the "inter.broker.protocol.version" property 
used by the brokers.

> Migration from ZK to KRaft works with 3.4 but fails from 3.5 upwards
> --------------------------------------------------------------------
>
>                 Key: KAFKA-15330
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15330
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.5.0, 3.5.1
>         Environment: Debian Bookworm/12.1
> kafka 3.4 and 3.5 / scala 2.13
> OpenJDK Runtime Environment (build 17.0.8+7-Debian-1deb12u1)
>            Reporter: Roland Sommer
>            Priority: Major
>
> We recently did some migration testing from our old ZK-based kafka clusters 
> to KRaft while still being on kafka 3.4. The migration tests succeeded at 
> first try. In the meantime we updated to kafka 3.5/3.5.1 and now we wanted to 
> continue our migration work, which ran into unexpected problems.
> On the controller we get messages like:
> {code:java}
> Aug 10 06:49:33 kafkactl01 kafka-server-start.sh[48572]: [2023-08-10 
> 06:49:33,072] INFO [KRaftMigrationDriver id=495] Still waiting for all 
> controller nodes ready to begin the migration. due to: Missing apiVersion 
> from nodes: [514, 760] 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver){code}
> On the broker side, we see:
> {code:java}
> 06:52:56,109] INFO [BrokerLifecycleManager id=6 isZkBroker=true] Unable to 
> register the broker because the RPC got timed out before it could be sent. 
> (kafka.server.BrokerLifecycleManager){code}
> If we reinstall the same development cluster with kafka 3.4, using the exact 
> same steps provided by your migration documentation (only difference is using 
> {{inter.broker.protocol.version=3.4}} instead of 
> {{{}inter.broker.protocol.version=3.5{}}}), everything works as expected. 
> Updating to kafka 3.5/3.5.1 yields the same problems.
> Testing is done on a three-node kafka cluster with a three-node zookeeper 
> ensemble and a three-node controller setup.
> Besides our default configuration containing the active zookeeper hosts etc., 
> this is what was added on the brokers:
> {code:java}
> # Migration
> advertised.listeners=PLAINTEXT://kafka03:9092
> listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
> zookeeper.metadata.migration.enable=true
> controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093
> controller.listener.names=CONTROLLER
> {code}
> The main controller config looks like this:
> {code:java}
> process.roles=controller
> node.id=495
> controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093
> listeners=CONTROLLER://:9093
> inter.broker.listener.name=PLAINTEXT
> controller.listener.names=CONTROLLER
> listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
> zookeeper.metadata.migration.enable=true
> {code}
> Both configs contain the identical {{zookeeper.connect}} settings, everything 
> is setup automatically so it should be identical on every run and we can 
> reliably reproduce migration success on kafka 3.4 and migration failure using 
> the same setup with kafka 3.5.
> There are other issues mentioning problems with ApiVersions like KAFKA-15230 
> - not quite sure if this is a duplicate of the underlying problem there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to