[ https://issues.apache.org/jira/browse/KAFKA-12964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao resolved KAFKA-12964. ----------------------------- Fix Version/s: 3.0.0 Resolution: Fixed merged the PR to trunk > Corrupt segment recovery can delete new producer state snapshots > ---------------------------------------------------------------- > > Key: KAFKA-12964 > URL: https://issues.apache.org/jira/browse/KAFKA-12964 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.8.0 > Reporter: Gardner Vickers > Assignee: Gardner Vickers > Priority: Major > Fix For: 3.0.0 > > > During log recovery, we may schedule asynchronous deletion in > deleteSegmentFiles. > [https://github.com/apache/kafka/blob/fc5245d8c37a6c9d585c5792940a8f9501bedbe1/core/src/main/scala/kafka/log/Log.scala#L2382] > If we're truncating the log, this may result in deletions for segments with > matching base offsets to segments which will be written in the future. To > avoid asynchronously deleting future segments, we rename the segment and > index files, but we do not do this for producer state snapshot files. > This leaves us vulnerable to a race condition where we could end up deleting > snapshot files for segments written after log recovery when async deletion > runs. > > To fix this, we should first remove the `SnapshotFile` from the > `ProducerStateManager` and rename the file to have a `Log.DeletedFileSuffix`. > Then we can asynchronously delete the snapshot file later. -- This message was sent by Atlassian Jira (v8.3.4#803005)