hachikuji commented on a change in pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#discussion_r583148687



##########
File path: core/src/main/scala/kafka/log/LogCleaner.scala
##########
@@ -599,21 +606,30 @@ private[log] class Cleaner(val id: Int,
         }
         currentSegmentOpt = nextSegmentOpt
       }
-
-      cleaned.onBecomeInactiveSegment()
-      // flush new segment to disk before swap
-      cleaned.flush()
-
-      // update the modification date to retain the last modified date of the 
original files
-      val modified = segments.last.lastModified
-      cleaned.lastModified = modified
-
-      // swap in new segment
-      info(s"Swapping in cleaned segment $cleaned for segment(s) $segments in 
log $log")
-      log.replaceSegments(List(cleaned), segments)
+      
+      // Result of cleaning included at least one record.
+      if (cleanedSegment.isDefined) {
+        val cleaned = cleanedSegment.get
+        cleaned.onBecomeInactiveSegment()
+        // flush new segment to disk before swap
+        cleaned.flush()
+
+        // update the modification date to retain the last modified date of 
the original files
+        val modified = segments.last.lastModified
+        cleaned.lastModified = modified
+
+        // swap in new segment
+        info(s"Swapping in cleaned segment $cleaned for segment(s) $segments 
in log $log")
+        log.replaceSegments(List(cleaned), segments)

Review comment:
       There are really only two cases for `replaceSegments`: log compaction, 
and segment splitting. We also call it in `completeSwapOperations`, but that is 
for recovery if the broker crashes in the middle of `replaceSegments`. It 
should be safe in either case to adjust the log start offset. Probably 
something similar to the deletion case which just ensures that the log start 
offset did not fall off the log:
   ```scala
   maybeIncrementLogStartOffset(segments.firstEntry.getValue.baseOffset, 
SegmentDeletion)
   ```
   Ideally `LogCompaction` would be the reason, at least in the log compaction 
case. It's becoming cumbersome to pass through the reason in all cases it is 
used, but I'm not sure I see an alternative. Perhaps we can consider how to 
refactor this in the future. Maybe there should be some kind of context object 
which encapsulates the higher level operation that is being done (such as log 
compaction or retention enforcement).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to