[ 
https://issues.apache.org/jira/browse/KAFKA-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guang Zhao reassigned KAFKA-17212:
----------------------------------

    Assignee: Guang Zhao

> Segments containing a single message can be incorrectly marked as local only
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-17212
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17212
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.8.0, 3.7.1, 3.9.0
>            Reporter: Guillaume Mallet
>            Assignee: Guang Zhao
>            Priority: Trivial
>
> There is an edge case triggered when a segment containing a single message 
> causes the segment to be considered as local only which skews the deletion 
> process towards deleting more data.
>  
> *This is very unlikely to happen in a real scenario but can happen in tests 
> when segment are rolled manually.* 
> *It could possibly happen when segment are rolled based on time but even then 
> the skew would be minimal.*
> h2. What happens
> In order to delete the right amount of data against the byte retention 
> policy, we first count all the bytes in 
> [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1335]
>  function that are breaching {{{}retention.bytes{}}}. In order to do this, 
> the size of each segment is added to the size of the segments present only on 
> the disk {{{}onlyLocalLogSegmentsSize{}}}.
> Listing the segment only present on disk is made through the function 
> [onlyLocalLogSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/scala/kafka/log/UnifiedLog.scala#L1618-L1619]
>  by adding the size of each segments that have a _baseOffset_ greater or 
> equal compared to {{{}highestOffsetInRemoteStorage{}}}{_}.{_}
> {{highestOffsetInRemoteStorage}} is the highest offset that has been 
> successfully sent to the remote store{_}.{_}
> The _baseOffset_ of a segment is “a [lower bound ({*}inclusive{*}) of the 
> offset in the 
> segment”|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L115].
>  
> In the case of a segment with a single message, the baseOffset can be equal 
> to _highestOffsetInRemoteStorage,_ which means that despite the offset being 
> offloaded to the RemoteStorage, we would count that segment as local only.
> This has consequence when counting the bytes to delete as we will count the 
> size of this segment twice in the 
> [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1155],
>  once as a segment offloaded in the RemoteStorage and once as a local segment 
> when 
> [onlyLocalSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1361-L1363]
>  is added. 
> The result is that {{remainingBreachedSize}} will be higher than expected 
> which can lead to more byte deleted than what we would initially expect, up 
> to the size of the segment which is double counted.
> The issue is due to the fact we are using a greater or equal rather than 
> equal. A segment present only locally will have a {{baseOffset}} strictly 
> greater than {{highestOffsetInRemoteStorage.}}
> h2. Reproducing the issue
> The problem is highlighted in the 2 tests added in this [commit 
> |https://github.com/apache/kafka/commit/97af351db517d69a2b37c92861e463a6d0c5cb8f]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to