Hi,

if the question is, if there are certain requirements for the filesystem that 
you use with the state backends, then I think there might be a small 
misconception. Currently, all state backends in Flink operator local to the 
task, i.e. either in memory (e.g. FsStateBackend) or also on the local file 
system (RocksDBStateBackend) of the machine that runs the task. The choice of 
distributed file system does only affect checkpoints and savepoints, and should 
not have a true impact on your job’s performance. It can of course have an 
impact on the checkpoint and restore duration. 

Checkpoints/savepoints must be written to a stable store like HDFS, that offers 
fault tolerance. Writes and reads for checkpoints are sequential bulk ops and 
(within reasonable bound) do not care too much about latencies. It is more 
important that your stable store offers a useful consistency model.

Best,
Stefan

> Am 03.08.2017 um 16:45 schrieb Vijay Srinivasaraghavan <vijikar...@yahoo.com>:
> 
> Hello,
> 
> I would like to know if we have any latency requirements for choosing 
> appropriate state backend? 
> 
> For example, if an HCFS implementation is used as Flink state backend 
> (instead of stock HDFS), are there any implications that one needs to know 
> with respect to the performance?
> 
> - Frequency of read/write operations, random vs sequential reads
> - Load/Usage pattern (Frequent small updates vs bulk operation)
> - RocksDB->HCFS (Is this kind of recommended option to mitigate some of the 
> challenges outlined above)
> - S3 Vs HDFS any performance numbers?
> 
> Appreciate any inputs on this.
> 
> Regards
> Vijay
> 
> 

  • State Backend Vijay Srinivasaraghavan
    • Re: State Backend Stefan Richter

Reply via email to