[ https://issues.apache.org/jira/browse/KAFKA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012738#comment-16012738 ]
Tommy Becker commented on KAFKA-5256: ------------------------------------- I noticed this originally when my state store directories were much larger than the topics backing them, and discovered it was because the data was being duplicated. The scenario I described above notwithstanding, this doesn't produce incorrect results, but wastes both disk space and CPU cycles as RocksDB compacting the duplicate data. > Non-checkpointed state stores should be deleted before restore > -------------------------------------------------------------- > > Key: KAFKA-5256 > URL: https://issues.apache.org/jira/browse/KAFKA-5256 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Tommy Becker > > Currently, Kafka Streams will re-use an existing state store even if there is > no checkpoint for it. This seems both inefficient (because duplicate inserts > can be made on restore) and incorrect (records which have been deleted from > the backing topic may still exist in the store). Since the contents of a > store with no checkpoint are unknown, the best way to proceed would be to > delete the store and recreate before restoring. -- This message was sent by Atlassian JIRA (v6.3.15#6346)