A. Sophie Blee-Goldman created KAFKA-10664:
----------------------------------------------

             Summary: Streams fails to overwrite corrupted offsets leading to 
infinite OffsetOutOfRangeException loop
                 Key: KAFKA-10664
                 URL: https://issues.apache.org/jira/browse/KAFKA-10664
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.7.0
            Reporter: A. Sophie Blee-Goldman
            Assignee: A. Sophie Blee-Goldman
             Fix For: 2.7.0


In KAFKA-10391 we fixed an issue where Streams could get stuck in an infinite 
loop of  OffsetOutOfRangeException/TaskCorruptedException due to 
re-initializing the corrupted offsets from the checkpoint after each revival. 
The fix we applied was to remove the corrupted offsets from the state manager 
and then force it to write a new checkpoint file without those offsets during 
revival.

Unfortunately we missed that there's an optimization in OffsetCheckpoint#write 
to just return without writing anything when there's no offsets. So if a task 
doesn't have any offsets that _aren't_ corrupted, it will skip overwriting the 
corrupted checkpoint.

Probably we should just fix the optimization in OffsetCheckpoint so that it 
deletes the current checkpoint in the case there are no offsets to write



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to