[ 
https://issues.apache.org/jira/browse/KAFKA-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975464#comment-16975464
 ] 

ASF GitHub Bot commented on KAFKA-9196:
---------------------------------------

hachikuji commented on pull request #7695: KAFKA-9196; Update high watermark 
metadata after segment roll
URL: https://github.com/apache/kafka/pull/7695
 
 
   When we roll a new segment, the log offset metadata tied to the high 
watermark may need to be updated. This is needed when the high watermark is 
equal to the log end offset at the time of the roll. Otherwise, we risk 
exposing uncommitted data early.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Records exposed before advancement of high watermark after segment roll
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-9196
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9196
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Blocker
>             Fix For: 2.4.0
>
>
> We cache the segment position of the high watermark and last stable offset 
> inside `Log`. There is no logic currently to update the cached position when 
> the segment rolls.
> Suppose we have a log with one segment (0.log). We write 5 records and update 
> the high watermark to match the log end offset. The cached segment position 
> will be something like LogOffsetMetadata(offset=5, segment=0, position=100). 
> Now suppose we roll to segment 5.log and write some new data. If a consumer 
> fetches from offset 5, then the current fetch logic will find segment 5.log 
> and incorrectly use position 100 in this segment. The result is that data 
> from the new segment gets exposed prematurely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to