If you have configured RocksDB as backend, Flink typically has multiple RocksDB 
instances per job - one for each parallel operator instance with keyed state. 
Those RocksDB instances live local to their corresponding operator instances. 
Parameter state.backend.rocksdb.checkpointdir configures the working directory 
of those instances. Working directories are used to store files during the 
operation of RocksDB, therefore it should mainly allow for fast access, e.g. be 
resident on a local disk filesystem. In contrast to that, 
state.backend.fs.checkpointdir specifies where checkpoint data is stored. Think 
of this as a backup directory, where the most important properties are 
availability and fault tolerance. This would typically be located on a 
distributed file system like HDFS that is also accessible from each node, so 
that operators can be recovered on different machines in case of machine 
failures.

> Am 03.02.2017 um 20:55 schrieb Mohit Anchlia <mohitanch...@gmail.com>:
> 
> I thought rocksdb is used to as a store backend. If that is the case then why 
> would are there 2 configuration parameter? Or in other words what is the 
> behavior if both state.backend.fs.checkpointdir and state.backend.rocksdb is 
> set?
> 
> On Fri, Feb 3, 2017 at 1:47 AM, Stefan Richter <s.rich...@data-artisans.com 
> <mailto:s.rich...@data-artisans.com>> wrote:
> Hi,
> 
> the purpose of the configuration parameter is described in the documentation 
> under 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.html>.
>  In a nutshell, state.checkpoints.dir contains the (small) meta data files 
> for checkpoints, which typically contains pointers to the files which contain 
> the actual state snapshot data. The state.backend.fs.checkpointdir is the 
> directory into which the actual state from the backends is written. Finally, 
> state.backend.rocksdb.checkpointdir is a poorly named key for the directory 
> of the RocksDB instance data and has in fact nothing to do with checkpoints.
> 
> Best,
> Stefan
> 
>> Am 03.02.2017 um 01:45 schrieb Mohit Anchlia <mohitanch...@gmail.com 
>> <mailto:mohitanch...@gmail.com>>:
>> 
>> Trying to understand these 3 parameters:
>> 
>> state.backend
>> state.backend.fs.checkpointdir
>> state.backend.rocksdb.checkpointdir
>> state.checkpoints.dir
>> 
>> As I understand stream of data and the state of operators are 2 different 
>> concepts and that both need to be checkpointed. I am bit confused about the 
>> purpose of these parameters and their applicability.
> 
> 

Reply via email to