[ https://issues.apache.org/jira/browse/KAFKA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013159#comment-16013159 ]
Guozhang Wang commented on KAFKA-5256: -------------------------------------- Thanks for the explanation. I agree that ideally we should ideally remove the local files before replaying the changelog topic from scratch since the local files state is "unknown". Regarding the general issue that application is down for longer than the tombstone retention period: that is an interesting question, and I think generally speaking log compaction should not go beyond the smallest corresponding checkpoints (note that there may be different instances fetching from the changelog at different time frames due to task migration). I think this issue itself worth further discussion on how to resolve it. > Non-checkpointed state stores should be deleted before restore > -------------------------------------------------------------- > > Key: KAFKA-5256 > URL: https://issues.apache.org/jira/browse/KAFKA-5256 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Tommy Becker > > Currently, Kafka Streams will re-use an existing state store even if there is > no checkpoint for it. This seems both inefficient (because duplicate inserts > can be made on restore) and incorrect (records which have been deleted from > the backing topic may still exist in the store). Since the contents of a > store with no checkpoint are unknown, the best way to proceed would be to > delete the store and recreate before restoring. -- This message was sent by Atlassian JIRA (v6.3.15#6346)