junrao commented on a change in pull request #8936: URL: https://github.com/apache/kafka/pull/8936#discussion_r451024109
########## File path: core/src/main/scala/kafka/log/LogSegment.scala ########## @@ -87,6 +87,12 @@ class LogSegment private[log] (val log: FileRecords, // we will recover the segments above the recovery point in recoverLog() // in any case so sanity checking them here is redundant. txnIndex.sanityCheck() + // Failing to sanity check the timeIndex can result in a scenario where log segments are + // prematurely deleted (before breaching retention periods) if the index file was not resized + // to disk successfully. + // KAFKA-10207 + timeIndex.sanityCheck() + offsetIndex.sanityCheck() Review comment: @Johnny-Malizia : Thanks for reporting this. A couple of comments on this. (1) It seems that there are cases where the index file is not trimmed properly. Every rolled segment calls LogSegment.onBecomeInactiveSegment(), which does trimming. So, we probably should just fix the trim logic there. (2) The intention of KIP-263 is to avoid unnecessary loading (and the sanity check) of indexes never used. This is mostly achieved through lazy indexes (actual index file only opened lazily on first use). Since the biggest cost of loading an index file is probably the I/O part, and the sanity check is computational only and relatively cheap, we can probably just do the index sanity check on index opening. This way, in the common case, we only pay the index sanity checking cost on active segments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org