Hi,

I'm working on a streaming application using Flink.
Several steps in the processing are state-full (I use custom Windows and
state-full operators ).

Now if during a normal run an worker fails the checkpointing system will be
used to recover.

But what if the entire application is stopped (deliberately) or stops/fails
because of a problem?

At this moment I have three main reasons/causes for doing this:
1) The application just dies because of a bug on my side or a problem like
for example this (which I'm actually confronted with):  *Failed to Update
HDFS Delegation Token for long running application in HA mode *
https://issues.apache.org/jira/browse/HDFS-9276
2) I need to rebalance my application (i.e. stop, change parallelism, start)
3) I need a new version of my software to be deployed. (i.e. I fixed a bug,
changed the topology and need to continue)

I assume the solution will be in some part be specific for my application.
The question is what features exist in Flink to support such a clean
"continue where I left of" scenario?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to