[ https://issues.apache.org/jira/browse/KAFKA-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guang Zhao reassigned KAFKA-17212: ---------------------------------- Assignee: Guang Zhao > Segments containing a single message can be incorrectly marked as local only > ---------------------------------------------------------------------------- > > Key: KAFKA-17212 > URL: https://issues.apache.org/jira/browse/KAFKA-17212 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.8.0, 3.7.1, 3.9.0 > Reporter: Guillaume Mallet > Assignee: Guang Zhao > Priority: Trivial > > There is an edge case triggered when a segment containing a single message > causes the segment to be considered as local only which skews the deletion > process towards deleting more data. > > *This is very unlikely to happen in a real scenario but can happen in tests > when segment are rolled manually.* > *It could possibly happen when segment are rolled based on time but even then > the skew would be minimal.* > h2. What happens > In order to delete the right amount of data against the byte retention > policy, we first count all the bytes in > [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1335] > function that are breaching {{{}retention.bytes{}}}. In order to do this, > the size of each segment is added to the size of the segments present only on > the disk {{{}onlyLocalLogSegmentsSize{}}}. > Listing the segment only present on disk is made through the function > [onlyLocalLogSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/scala/kafka/log/UnifiedLog.scala#L1618-L1619] > by adding the size of each segments that have a _baseOffset_ greater or > equal compared to {{{}highestOffsetInRemoteStorage{}}}{_}.{_} > {{highestOffsetInRemoteStorage}} is the highest offset that has been > successfully sent to the remote store{_}.{_} > The _baseOffset_ of a segment is “a [lower bound ({*}inclusive{*}) of the > offset in the > segment”|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L115]. > > In the case of a segment with a single message, the baseOffset can be equal > to _highestOffsetInRemoteStorage,_ which means that despite the offset being > offloaded to the RemoteStorage, we would count that segment as local only. > This has consequence when counting the bytes to delete as we will count the > size of this segment twice in the > [buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1155], > once as a segment offloaded in the RemoteStorage and once as a local segment > when > [onlyLocalSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1361-L1363] > is added. > The result is that {{remainingBreachedSize}} will be higher than expected > which can lead to more byte deleted than what we would initially expect, up > to the size of the segment which is double counted. > The issue is due to the fact we are using a greater or equal rather than > equal. A segment present only locally will have a {{baseOffset}} strictly > greater than {{highestOffsetInRemoteStorage.}} > h2. Reproducing the issue > The problem is highlighted in the 2 tests added in this [commit > |https://github.com/apache/kafka/commit/97af351db517d69a2b37c92861e463a6d0c5cb8f] -- This message was sent by Atlassian Jira (v8.20.10#820010)