[ 
https://issues.apache.org/jira/browse/KAFKA-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-17950.
-----------------------------------------
    Resolution: Invalid

Ok, my mistake. It seems like an incorrect voter list configuration caused the 
issue in controller1.properties

> The leader requested truncation to below the current high watermark
> -------------------------------------------------------------------
>
>                 Key: KAFKA-17950
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17950
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.9.0, 3.9.1
>            Reporter: Viktor Somogyi-Vass
>            Priority: Blocker
>         Attachments: broker1.log, broker2.log, broker3.log, 
> controller-logs.zip, controller1-migration-enabled.properties, 
> controller1.properties, controller2-migration-enabled.properties, 
> controller2.properties, controller3-migration-enabled.properties, 
> controller3.properties, kraft1.log, kraft2.log, kraft3.log, 
> producer-perf.log, producer.properties, server1-migrated-to-kraft.properties, 
> server1-migration-enabled.properties, server1.properties, 
> server2-migrated-to-kraft.properties, server2-migration-enabled.properties, 
> server2.properties, server3-migrated-to-kraft.properties, 
> server3-migration-enabled.properties, server3.properties, zookeeper.log
>
>
> While testing the migration from 3.9 ZK Kafka to 3.9 KRaft, I find that in 
> the last step (finalization) where I restart the controllers in non-migration 
> mode, the last controller restart causes a fatal failure in the cluster: 
> every node (broker and controller) stops beside the controller I restarted.
> The failing nodes throw the same exception at the time:
> {noformat}
> [2024-11-06 14:02:13,498] ERROR Encountered fatal fault: Unexpected error in 
> raft IO thread (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> org.apache.kafka.common.KafkaException: The leader requested truncation to 
> offset 484, which is below the current high watermark 
> LogOffsetMetadata(offset=508, metadata=Optional.empty)
>         at 
> org.apache.kafka.raft.KafkaRaftClient.lambda$handleFetchResponse$11(KafkaRaftClient.java:1619)
>         at java.base/java.util.Optional.ifPresent(Optional.java:183)
>         at 
> org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1616)
>         at 
> org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:2457)
>         at 
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2613)
>         at 
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3312)
>         at 
> org.apache.kafka.raft.KafkaRaftClientDriver.doWork(KafkaRaftClientDriver.java:64)
>         at 
> org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136)
> {noformat}
> Setup:
> * single Zookeeper node
> * 3 brokers
> * 1 running producer-performance client
> * 3 controllers
> Repro:
> # Start Zookeeper with zookeeper.properties
> {noformat}
> bin/zookeeper-server-start.sh repro-conf/zookeeper.properties
> {noformat}
> # Start brokers with serverX.properties
> {noformat}
> bin/kafka-server-start.sh repro-conf/server1.properties
> bin/kafka-server-start.sh repro-conf/server2.properties
> bin/kafka-server-start.sh repro-conf/server3.properties
> {noformat}
> # Start the producer-performance tool
> {noformat}
> bin/kafka-producer-perf-test.sh --topic test1 --num-records 1000000 
> --throughput 100 --record-size 10000 --producer.config 
> repro-conf/producer.properties
> {noformat}
> # Get the cluster ID and format all controller log dirs
> # Start the controllers in migration mode
> {noformat}
> bin/kafka-server-start.sh repro-conf/controller1-migration-enabled.properties
> bin/kafka-server-start.sh repro-conf/controller2-migration-enabled.properties
> bin/kafka-server-start.sh repro-conf/controller3-migration-enabled.properties
> {noformat}
> # Restart the brokers (rolling) in migration mode with the following configs. 
> (My restart order was 1,2,3.)
> {noformat}
> bin/kafka-server-start.sh repro-conf/server1-migration-enabled.properties
> bin/kafka-server-start.sh repro-conf/server2-migration-enabled.properties
> bin/kafka-server-start.sh repro-conf/server3-migration-enabled.properties
> {noformat}
> # Restart the brokers (rolling) in migrated mode with the following configs 
> (at this point they are connected to the controllers and not ZK). My restart 
> order was 1,2,3.
> {noformat}
> bin/kafka-server-start.sh repro-conf/server1-migrated-to-kraft.properties
> bin/kafka-server-start.sh repro-conf/server2-migrated-to-kraft.properties
> bin/kafka-server-start.sh repro-conf/server3-migrated-to-kraft.properties
> {noformat}
> # At this point all brokers run with KRaft, let's rolling restart the 
> controllers to finalize. (The order was 3,2,1.)
> {noformat}
> bin/kafka-server-start.sh repro-conf/controller3.properties
> bin/kafka-server-start.sh repro-conf/controller2.properties
> bin/kafka-server-start.sh repro-conf/controller1.properties
> {noformat}
> At the last restart, when controller1 starts up, all other nodes crash at 
> once. Attached all logs and configuration.
> I've been working from the 3.9 branch, the hash is 4a562cd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to