Matthias J. Sax created KAFKA-13350:
---------------------------------------

             Summary: Handle task corrupted exception on a per state store basis
                 Key: KAFKA-13350
                 URL: https://issues.apache.org/jira/browse/KAFKA-13350
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax


When we hit an `OffsetOutOfRangeException` during restore, we close a tasks as 
dirty and retry the restore process from scratch. For this case, we wipe out 
the task's state stores.

If a task has multiple state stores, we also wipe out state that is actually 
clean and thus need to redo work for no reason. Instead of wiping out all state 
store, we should only wipe out the single state store that corresponds to the 
changelog topic partition that hit the `OffsetOutOfRangeException`, but 
preserve the restore progress for all other state stores.

We need to consider persistent and in-memory stores: for persistent stores, it 
would be fine to close the not affected stores cleanly and also write the 
checkpoint file. For in-memory stores however, we should not close the store to 
avoid dropping the in-memory data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to