dhruvilshah3 commented on a change in pull request #9110: URL: https://github.com/apache/kafka/pull/9110#discussion_r464100823
########## File path: core/src/main/scala/kafka/log/Log.scala ########## @@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File, * @param segments The log segments to schedule for deletion * @param asyncDelete Whether the segment files should be deleted asynchronously */ - private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = { + private def removeAndDeleteSegments(segments: Iterable[LogSegment], + asyncDelete: Boolean, + reason: SegmentDeletionReason): Unit = { if (segments.nonEmpty) { lock synchronized { // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by // removing the deleted segment, we should force materialization of the iterator here, so that results of the // iteration remain valid and deterministic. val toDelete = segments.toList toDelete.foreach { segment => + info(s"${reason.reasonString(this, segment)}") Review comment: @ijuma We log one message per deleted segment. This could cause temporary increase in log volume when DeleteRecords is used or when retention is lowered, for example. Overall, we have a few options with different tradeoffs: 1. Log a common reason per batch being deleted, including base offsets of segments being deleted. eg. ``` Deleting segments due to retention time 999ms breach. BaseOffsets: (0,5,...). ``` 2. Log a common reason per batch being deleted, including base offsets and metadata of segments. eg. ``` Deleting segments due to retention time 999ms breach: LogSegment(baseOffset=0, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),LogSegment(baseOffset=5, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),... ``` 3. Log one message per segment being deleted. This is the current behavior. eg. ``` Segment with base offset 0 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ... Segment with base offset 5 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ... ... ``` Doing (2) may be a reasonable tradeoff. It eliminates some of the redundancy at the cost of making it to glean per segment metadata. Let me know what you think. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org