Guillaume Mallet created KAFKA-17212: ----------------------------------------
Summary: Segments containing a single message can be incorrectly marked as local only Key: KAFKA-17212 URL: https://issues.apache.org/jira/browse/KAFKA-17212 Project: Kafka Issue Type: Bug Components: Tiered-Storage Affects Versions: 3.7.1, 3.8.0, 3.9.0 Reporter: Guillaume Mallet There is an edge case triggered when a segment containing a single message causes the segment to be considered as local only which skews the deletion process towards deleting more data. *This is very unlikely to happen in a real scenario but can happen in tests when segment are rolled manually.* *It could possibly happen when segment are rolled based on time but even then the skew would be minimal.* h2. What happens In order to delete the right amount of data against the byte retention policy, we first count all the bytes in [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1335] function that are breaching {{{}retention.bytes{}}}. In order to do this, the size of each segment is added to the size of the segments present only on the disk {{{}onlyLocalLogSegmentsSize{}}}. Listing the segment only present on disk is made through the function [onlyLocalLogSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/scala/kafka/log/UnifiedLog.scala#L1618-L1619] by adding the size of each segments that have a _baseOffset_ greater or equal compared to {{{}highestOffsetInRemoteStorage{}}}{_}.{_} {{highestOffsetInRemoteStorage}} is the highest offset that has been successfully sent to the remote store{_}.{_} The _baseOffset_ of a segment is “a [lower bound ({*}inclusive{*}) of the offset in the segment”|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L115]. In the case of a segment with a single message, the baseOffset can be equal to _highestOffsetInRemoteStorage,_ which means that despite the offset being offloaded to the RemoteStorage, we would count that segment as local only. This has consequence when counting the bytes to delete as we will count the size of this segment twice in the [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1155], once as a segment offloaded in the RemoteStorage and once as a local segment when [onlyLocalSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1361-L1363] is added. The result is that {{remainingBreachedSize}} will be higher than expected which can lead to more byte deleted than what we would initially expect, up to the size of the segment which is double counted. The issue is due to the fact we are using a greater or equal rather than equal. A segment present only locally will have a {{baseOffset}} strictly greater than {{highestOffsetInRemoteStorage.}} h2. Reproducing the issue The problem is highlighted in the 2 tests added in this [commit |https://github.com/apache/kafka/commit/97af351db517d69a2b37c92861e463a6d0c5cb8f] -- This message was sent by Atlassian Jira (v8.20.10#820010)