Hi,
you are right. Currently there is no incremental checkpointing and
therefore, at each checkpoint, we essentially copy the whole RocksDB
database to HDFS (or whatever filesystem you chose as a backup location).
As far as I know, Stephan will start working on adding support for
incremental snapshots this week or next week.

Cheers,
Aljoscha

On Thu, 7 Apr 2016 at 09:55 Krzysztof Zarzycki <k.zarzy...@gmail.com> wrote:

> Hi,
> I saw the documentation and source code of the state management with
> RocksDB and before I use it, I'm concerned of one thing: Am I right that
> currently when state is being checkpointed, the whole RocksDB state is
> snapshotted? There is no incremental, diff snapshotting, is it? If so, this
> seems to be unfeasible for keeping state counted in tens or hundreds of GBs
> (and you reach that size of a state, when you want to keep an embedded
> state of the streaming application instead of going out to Cassandra/Hbase
> or other DB). It will just cost too much to do snapshots of such large
> state.
>
> Samza as a good example to compare, writes every state change to Kafka
> topic, considering it a snapshot in the shape of changelog. Of course in
> the moment of app restart, recovering the state from the changelog would be
> too costly, that is why the changelog topic is compacted. Plus, I think
> Samza does a state snapshot from time to time anyway (but I'm not sure of
> that).
>
> Thanks for answering my doubts,
> Krzysztof
>
>

Reply via email to