Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Chirag Dewan
Thanks a lot Stefan. This clarifies everything. Regards, Chirag  On Thursday, 26 April, 2018, 7:16:52 PM IST, Stefan Richter wrote: Adding one thing, the format of the non-incremental is similar but unfortunately not (yet) identical with the FS backend. This is because of some interna

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Stefan Richter
Adding one thing, the format of the non-incremental is similar but unfortunately not (yet) identical with the FS backend. This is because of some internal implementation details that allow the FS checkpoints to be slightly more consise in the file format but we might „de-optimize“ this minor di

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Stefan Richter
On the local disk you have the normal RocksDB working directory consisting mainly of the SSTable files. In the checkpoint directory on distributed storage it depends on whether or not you are using incremental checkpoints. For incremental checkpoints, the files are essentially the SSTables uploa

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Chirag Dewan
Wow never considered it that way.  Thanks a lot for clarifying Stefan. This gives rise to another question. Whats the format of this data? Is it the same format which is used to store checkpoints when FS state backend is used? Regards, Chirag Sent from Yahoo Mail on Android On Thu, 26 Apr 201

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Marvin777
Hi, I'm agree with Stefan. I think you can look at this document, given here: Apache Flink 1.4 Documentation:Checkpointing Best, Qingxiang Ma. 2018-04-26 20:00 GMT+08:00 Stefan Richter : > Hi, > >

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Stefan Richter
Hi, I think there is a misunderstanding. RocksDB state backend always operates on local disk of the node that runs your task to give you optimal performance. You can think of this as a transient working area that does not require any durability. Durability always happens through checkpoints (or