Hi, you are right. Currently there is no incremental checkpointing and therefore, at each checkpoint, we essentially copy the whole RocksDB database to HDFS (or whatever filesystem you chose as a backup location). As far as I know, Stephan will start working on adding support for incremental snapshots this week or next week.
Cheers, Aljoscha On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki <k.zarzy...@gmail.com> wrote: > Hi, > I saw the documentation and source code of the state management with > RocksDB and before I use it, I'm concerned of one thing: Am I right that > currently when state is being checkpointed, the whole RocksDB state is > snapshotted? There is no incremental, diff snapshotting, is it? If so, this > seems to be unfeasible for keeping state counted in tens or hundreds of GBs > (and you reach that size of a state, when you want to keep an embedded > state of the streaming application instead of going out to Cassandra/Hbase > or other DB). It will just cost too much to do snapshots of such large > state. > > Samza as a good example to compare, writes every state change to Kafka > topic, considering it a snapshot in the shape of changelog. Of course in > the moment of app restart, recovering the state from the changelog would be > too costly, that is why the changelog topic is compacted. Plus, I think > Samza does a state snapshot from time to time anyway (but I'm not sure of > that). > > Thanks for answering my doubts, > Krzysztof > >