gitlw commented on code in PR #12029: URL: https://github.com/apache/kafka/pull/12029#discussion_r850657561
########## core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala: ########## @@ -2615,7 +2615,10 @@ class ReplicaManagerTest { @Test def testStopReplicaWithDeletePartitionAndExistingPartitionAndNewerLeaderEpochAndIOException(): Unit = { Review Comment: @dajac @junrao Below is my understand of why the KafkaStorageException was triggered before this change 1. We explicitly delete the underlying directory via the throwIOException flag https://github.com/apache/kafka/blob/184f824cd12ae3e8907ac9066400f2c79fe01d2f/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala#L2688 2. During async deletion of the replica, the re-initialization of the leader epoch cache triggers a FileSystemException ``` <init>:58, IOException (java.io) <init>:73, FileSystemException (java.nio.file) translateToIOException:100, UnixException (sun.nio.fs) rethrowAsIOException:111, UnixException (sun.nio.fs) rethrowAsIOException:116, UnixException (sun.nio.fs) newByteChannel:219, UnixFileSystemProvider (sun.nio.fs) newByteChannel:371, Files (java.nio.file) createFile:648, Files (java.nio.file) <init>:104, CheckpointFile (kafka.server.checkpoints) <init>:68, LeaderEpochCheckpointFile (kafka.server.checkpoints) newLeaderEpochFileCache$1:2516, Log$ (kafka.log) maybeCreateLeaderEpochCache:2532, Log$ (kafka.log) initializeLeaderEpochCache:635, Log (kafka.log) $anonfun$renameDir$2:758, Log (kafka.log) renameDir:2699, Log (kafka.log) asyncDelete:1044, LogManager (kafka.log) $anonfun$asyncDelete$3:1079, LogManager (kafka.log) apply:-1, 435995262 (kafka.log.LogManager$$Lambda$842) foreach:407, Option (scala) $anonfun$asyncDelete$2$adapted:1077, LogManager (kafka.log) apply:-1, 1385607480 (kafka.log.LogManager$$Lambda$841) foreach:79, HashSet (scala.collection.mutable) asyncDelete:1075, LogManager (kafka.log) stopPartitions:507, ReplicaManager (kafka.server) stopReplicas:423, ReplicaManager (kafka.server) testStopReplicaWithExistingPartition:2378, ReplicaManagerTest (kafka.server) testStopReplicaWithDeletePartitionAndExistingPartitionAndNewerLeaderEpochAndIOException:2290, ReplicaManagerTest (kafka.server) ``` 3. The FileSystemException is converted into a KafkaStorageException https://github.com/apache/kafka/blob/eefdf9d6a7fd79a21bb9aea2df25ea642062f28c/core/src/main/scala/kafka/log/LocalLog.scala#L790 Thus when we disable the re-initialization of the leader epoch cache during the async deletion, the FileSystemException and the KafkaStorageException will no longer be triggered, and that is what I mean by the underlying directory no longer being "needed". I can clarify the comment by indicating that the async deletion code path no longer reads from the underlying directory. Does that sound ok to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org