[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781703#comment-17781703
 ] 

Divij Vaidya commented on KAFKA-15388:
--------------------------------------

[~goyarpit] That is an good observation but I think it doesn't impact the 
archive functionality. In TS, we assume that the end offset of a segment is 
nextSegmentBaseOffset -1. Now it might the be the case that this end offset 
doesn't exist in the segment because it has been removed by compaction but that 
is ok from an archive perspective. As long as entire code works with the 
assumption that the endOffset stored in RLMM may not actually exist in the 
segment and is just a pointer to where the segment should have ended in prior 
to compaction, then we should be good. As Christo mentioned in description, the 
only place in the code where this assumption is violated is on the read path. 
Hence, we can assume that the line you pointed to is correct and sets the 
contract, and we can make a change in read path to honor that contract.

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15388
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15388
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Satish Duggana
>            Assignee: Arpit Goyal
>            Priority: Blocker
>             Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to