[ https://issues.apache.org/jira/browse/KAFKA-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932938#comment-17932938 ]
Luke Chen commented on KAFKA-18930: ----------------------------------- [~davidarthur] [~mumrah] , I'd like to hear your thought on this issue. Thanks. > KRaft MigrationEvent won't retry when failing to write data to ZK > ------------------------------------------------------------------ > > Key: KAFKA-18930 > URL: https://issues.apache.org/jira/browse/KAFKA-18930 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.9.0 > Reporter: Luke Chen > Priority: Major > > When running ZK migrating to KRaft, there will be a dual-write mode. In that > mode, metadata will write to KRaft, then write to ZK asynchronously. When > there's some exception, KRaft MigrationEvent won't retry when failing to > write data to ZK. That causes metadata inconsistency between KRaft and ZK. > > Note: > 1. Besides, when doing KRaft controller clean shutdown, we should keep > retrying the failing ZK writing until force shutdown, to make sure the > metadata is consistent. > 2. When doing shutdown, [the order of > shutdown|https://github.com/apache/kafka/blob/1ec1043d5197c4f807fa5cbc41d875b289443096/core/src/main/scala/kafka/server/ControllerServer.scala#L69-L76] > is to close ZK -> close RPC Client -> close migration driver. That causes > another issue that even if we retry the ZK write, it will never succeed when > shutdown is ongoing because ZK connection is closed first. > > The impact is when rolling back to ZK mode during migration, the metadata in > ZK is out of date -- This message was sent by Atlassian Jira (v8.20.10#820010)