hachikuji commented on a change in pull request #9590: URL: https://github.com/apache/kafka/pull/9590#discussion_r583179618
########## File path: core/src/main/scala/kafka/log/LogCleaner.scala ########## @@ -701,11 +719,17 @@ private[log] class Cleaner(val id: Int, // if any messages are to be retained, write them out val outputBuffer = result.outputBuffer if (outputBuffer.position() > 0) { + if (destSegment.isEmpty) { + // create a new segment with a suffix appended to the name of the log and indexes + destSegment = Some(LogCleaner.createNewCleanedSegment(log, result.minOffset())) + transactionMetadata.cleanedIndex = Some(destSegment.get.txnIndex) Review comment: Ok, I think I understand. One thing that makes this a little confusing is the need to reverse the collection. Why don't just build the index in order? I am debating whether this is good enough. A potential problem is that we don't have a guarantee on the size of the index. In the common case, it should be small, but there is nothing to prevent an entire segment from being full of aborted transaction markers. Currently we wait until `filterInto` returns before creating the new segment, but maybe instead we could do it in `checkBatchRetention` after the first retained batch is observed? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org