Guillaume Mallet created KAFKA-17212:
----------------------------------------
             Summary: Segments containing a single message can be incorrectly 
marked as local only
                 Key: KAFKA-17212
                 URL: https://issues.apache.org/jira/browse/KAFKA-17212
             Project: Kafka
          Issue Type: Bug
          Components: Tiered-Storage
    Affects Versions: 3.7.1, 3.8.0, 3.9.0
            Reporter: Guillaume Mallet


There is an edge case triggered when a segment containing a single message 
causes the segment to be considered as local only which skews the deletion 
process towards deleting more data.

 

*This is very unlikely to happen in a real scenario but can happen in tests 
when segment are rolled manually.* 
*It could possibly happen when segment are rolled based on time but even then 
the skew would be minimal.*
h2. What happens

In order to delete the right amount of data against the byte retention policy, 
we first count all the bytes in 
[buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1335]
 function that are breaching {{{}retention.bytes{}}}. In order to do this, the 
size of each segment is added to the size of the segments present only on the 
disk {{{}onlyLocalLogSegmentsSize{}}}.
Listing the segment only present on disk is made through the function 
[onlyLocalLogSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/scala/kafka/log/UnifiedLog.scala#L1618-L1619]
 by adding the size of each segments that have a _baseOffset_ greater or equal 
compared to {{{}highestOffsetInRemoteStorage{}}}{_}.{_}

{{highestOffsetInRemoteStorage}} is the highest offset that has been 
successfully sent to the remote store{_}.{_}
The _baseOffset_ of a segment is “a [lower bound ({*}inclusive{*}) of the 
offset in the 
segment”|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L115].
 
In the case of a segment with a single message, the baseOffset can be equal to 
_highestOffsetInRemoteStorage,_ which means that despite the offset being 
offloaded to the RemoteStorage, we would count that segment as local only.

This has consequence when counting the bytes to delete as we will count the 
size of this segment twice in the 
[buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1155],
 once as a segment offloaded in the RemoteStorage and once as a local segment 
when 
[onlyLocalSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1361-L1363]
 is added. 

The result is that {{remainingBreachedSize}} will be higher than expected which 
can lead to more byte deleted than what we would initially expect, up to the 
size of the segment which is double counted.

The issue is due to the fact we are using a greater or equal rather than equal. 
A segment present only locally will have a {{baseOffset}} strictly greater than 
{{highestOffsetInRemoteStorage.}}
h2. Reproducing the issue

The problem is highlighted in the 2 tests added in this [commit 
|https://github.com/apache/kafka/commit/97af351db517d69a2b37c92861e463a6d0c5cb8f]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to