Hey Niels, as Gabor wrote, this feature has been merged to the master branch recently.
The docs are online here: https://ci.apache.org/projects/flink/flink-docs-master/apis/savepoints.html Feel free to report back your experience with it if you give it a try. – Ufuk > On 14 Jan 2016, at 11:09, Gábor Gévay <gga...@gmail.com> wrote: > > Hello, > > You are probably looking for this feature: > https://issues.apache.org/jira/browse/FLINK-2976 > > Best, > Gábor > > > > > 2016-01-14 11:05 GMT+01:00 Niels Basjes <ni...@basjes.nl>: >> Hi, >> >> I'm working on a streaming application using Flink. >> Several steps in the processing are state-full (I use custom Windows and >> state-full operators ). >> >> Now if during a normal run an worker fails the checkpointing system will be >> used to recover. >> >> But what if the entire application is stopped (deliberately) or stops/fails >> because of a problem? >> >> At this moment I have three main reasons/causes for doing this: >> 1) The application just dies because of a bug on my side or a problem like >> for example this (which I'm actually confronted with): Failed to Update >> HDFS Delegation Token for long running application in HA mode >> https://issues.apache.org/jira/browse/HDFS-9276 >> 2) I need to rebalance my application (i.e. stop, change parallelism, start) >> 3) I need a new version of my software to be deployed. (i.e. I fixed a bug, >> changed the topology and need to continue) >> >> I assume the solution will be in some part be specific for my application. >> The question is what features exist in Flink to support such a clean >> "continue where I left of" scenario? >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes