[ https://issues.apache.org/jira/browse/KAFKA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007804#comment-16007804 ]
ASF GitHub Bot commented on KAFKA-4317: --------------------------------------- Github user dguy closed the pull request at: https://github.com/apache/kafka/pull/3024 > RocksDB checkpoint files lost on kill -9 > ---------------------------------------- > > Key: KAFKA-4317 > URL: https://issues.apache.org/jira/browse/KAFKA-4317 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 0.10.0.1 > Reporter: Greg Fodor > Assignee: Damian Guy > Priority: Critical > Labels: architecture, needs-kip, user-experience > Fix For: 0.11.0.0 > > > Right now, the checkpoint files for logged RocksDB stores are written during > a graceful shutdown, and removed upon restoration. Unfortunately this means > that in a scenario where the process is forcibly killed, the checkpoint files > are not there, so all RocksDB stores are rematerialized from scratch on the > next launch. > In a way, this is good, because it simulates bootstrapping a new node (for > example, its a good way to see how much I/O is used to rematerialize the > stores) however it leads to longer recovery times when a non-graceful > shutdown occurs and we want to get the job up and running again. > It seems that two possible things to consider: > - Simply do not remove checkpoint files on restoring. This way a kill -9 will > result in only repeating the restoration of all the data generated in the > source topics since the last graceful shutdown. > - Continually update the checkpoint files (perhaps on commit) -- this would > result in the least amount of overhead/latency in restarting, but the > additional complexity may not be worth it. > https://cwiki.apache.org/confluence/display/KAFKA/KIP-116%3A+Add+State+Store+Checkpoint+Interval+Configuration -- This message was sent by Atlassian JIRA (v6.3.15#6346)