Thanks a lot for the details Steffan. -- Christophe
On Mon, Feb 5, 2018 at 11:31 AM, Stefan Richter <s.rich...@data-artisans.com > wrote: > Hi, > > you are correct that RocksDB has a „working directory“ on local disk and > checkpoints + savepoints go to a distributed filesystem. > > - if I have 3 TaskManager I should expect more or less (depending on how > the tasks are balanced) to find a third of my overall state stored on disk > on each of this TaskManager node? > > This question is not so much about RocksDB, but more about Flink’s keyBy > partitioning, i.e. how work is distributed between the parallel instances > of an operator, and the answer is that it will apply hash partitioning > based on your event keys to distribute the keys (and their state) between > your 3 nodes. If your key space is very skewed or there are heavy hitter > keys with much larger state than most other keys, this can lead to some > imbalances. If your keys are not skewed and have similar state size, every > node should have roughly the same state size. > > - if the local node/disk fails I will get the state back from the > distributed disk and things will start again and all is fine. However what > happens if the distributed disk fails? Will Flink continue processing > waiting for me to mount a new distributed disk? Or will it stop? May I lose > data/reprocess things under that condition? > > Starting from Flink 1.5, this is configurable, please see > https://issues.apache.org/jira/browse/FLINK-4809 and htt > ps://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/ > checkpointing.html in section „*fail/continue task on checkpoint errors*“. > If you tolerate checkpoint failures, you will not lose data: if your job > fails, it can recover from the latest successful checkpoint once your DFS > is again available If the job does not fail, it will eventually make > another checkpoint once DFS is back. If you do not tolerate checkpoint > failures, your job will simply fail and restart from the last successful > checkpoint and recover once DFS is back. > > Best, > Stefan > > Am 03.02.2018 um 17:45 schrieb Christophe Jolif <cjo...@gmail.com>: > > Thanks for sharing Kien. Sounds like the logical behavior but good to hear > it is confirmed by your experience. > > -- > Christophe > > On Sat, Feb 3, 2018 at 7:25 AM, Kien Truong <duckientru...@gmail.com> > wrote: > >> >> >> Sent from TypeApp <http://www.typeapp.com/r?b=11979> >> On Feb 3, 2018, at 10:48, Kien Truong <duckientru...@gmail.com> wrote: >>> >>> Hi, >>> Speaking from my experience, if the distributed disk fail, the >>> checkpoint will fail as well, but the job will continue running. The >>> checkpoint scheduler will keep running, so the first scheduled checkpoint >>> after you repair your disk should succeed. >>> >>> Of course, if you also write to the distributed disk inside your job, >>> then your job may crash too, but this is unrelated to the checkpoint >>> process. >>> >>> Best regards, >>> Kien >>> >>> Sent from TypeApp <http://www.typeapp.com/r?b=11979> >>> On Feb 2, 2018, at 23:30, Christophe Jolif < cjo...@gmail.com> wrote: >>>> >>>> If I understand well RocksDB is using two disk, the Task Manager local >>>> disk for "local storage" of the state and the distributed disk for >>>> checkpointing. >>>> >>>> Two questions: >>>> >>>> - if I have 3 TaskManager I should expect more or less (depending on >>>> how the tasks are balanced) to find a third of my overall state stored on >>>> disk on each of this TaskManager node? >>>> >>>> - if the local node/disk fails I will get the state back from the >>>> distributed disk and things will start again and all is fine. However what >>>> happens if the distributed disk fails? Will Flink continue processing >>>> waiting for me to mount a new distributed disk? Or will it stop? May I lose >>>> data/reprocess things under that condition? >>>> >>>> -- >>>> Christophe Jolif >>>> >>> >