Hi James, Sure! The basic idea of checkpoints is that they are fully owned by the running job and used for failure recovery. Thus by default if you stopped the job, checkpoints are being removed. If you want to stop a job and then later resume working from the same point that it has previously stopped, you most likely want to use savepoints [1]. You can stop the job with a savepoint and later you can restart another job from that savepoint.
Regarding the externalised checkpoints. Technically you could use them in the similar way, but there is no command like "take a checkpoint and stop the job". Nevertheless you might consider enabling them as this allows you to manually cancel the job if it enters an endless recovery/failure loop, fix the underlying issue, and restart the job from the externalised checkpoint. Best, Piotrek [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/ śr., 16 lut 2022 o 16:44 James Sandys-Lumsdaine <jas...@hotmail.com> napisał(a): > Hi all, > > I have a 1.14 Flink streaming workflow with many stateful functions that > has a FsStateBackend and checkpointed enabled, although I haven't set a > location for the checkpointed state. > > I've really struggled to understand how I can stop my Flink job and > restart it and ensure it carries off exactly where is left off by using the > state or checkpoints or savepoints. This is not clearly explained in the > book or the web documentation. > > Since I have no control over my Flink job id I assume I can not force > Flink to pick up the state recorded under the jobId directory for the > FsStateBackend. Therefore I *think* Flink should read back in the last > checkpointed data but I don't understand how to force my program to read > this in? Do I use retained checkpoints or not? How can I force my program > either use the last checkpointed state (e.g. when running from my IDE, > starting and stopping the program) or maybe force it *not *to read in the > state and start completely fresh? > > The web documentation talks about bin/flink but I am running from my IDE > so I want my Java code to control this progress using the Flink API in Java. > > Can anyone give me some basic pointers as I'm obviously missing something > fundamental on how to allow my program to be stopped and started without > losing all the state. > > Many thanks, > > James. > >