Thx for your suggestions. In the end I’ve integrated altering flink-conf.yaml into job submission, which we do always via some custom ansible scripts. This way each job has its own directory for external checkpoints.
Best, Dawid > On 20 Feb 2018, at 17:21, Chesnay Schepler <ches...@apache.org> wrote: > > There is the "lastCheckpointExternalPath" metric that is scoped by job. You > could access this via JMX. > > On 20.02.2018 17:17, Aljoscha Krettek wrote: >> Hi, >> >> I think there is currently no easy way of doing this. Things that come to >> mind are: >> - looking at the JM log >> - polling the JM REST interface for completed externalised checkpoints >> >> The good news is that Flink 1.5 will rework how externalised checkpoints >> work a bit: basically, all checkpoints can now be considered externalised >> and the metadata will be stored in the root directory of the checkpoint, not >> in one global directory for all jobs. This way, the metadata for >> externalised checkpoints resides in the checkpoint directory of each job and >> it should be reasonably simple to restore from that. >> >> Best, >> Aljoscha >> >> >>> On 15. Feb 2018, at 10:55, Dawid Wysakowicz <wysakowicz.da...@gmail.com> >>> wrote: >>> >>> Hi, >>> >>> We are running few jobs on yarn and in case of some failure (that the job >>> could not recover from on its own) we want to use last successful external >>> checkpoint to restore the job from manually. The problem is that the >>> ${state.checkpoints.dir} contains checkpoint directories for all jobs that >>> we are running. How can we find out the last successful external checkpoint >>> for some particular job? Will be grateful for any pointers. >>> >>> Regards, >>> Dawid >>> >> >
signature.asc
Description: Message signed with OpenPGP