Hi Ufuk, I see, but in my case the failure caused YARN application moved into a finished/failed state - so the application itself is no longer running. How can I restart the application (or start a new YARN application) and ensure that it uses the checkpoint pointer stored in Zookeeper?
Thanks, Josh On Fri, Nov 4, 2016 at 1:52 PM, Ufuk Celebi <u...@apache.org> wrote: > No you don't need to manually trigger a savepoint. With HA checkpoints > are persisted externally and store a pointer in ZooKeeper to recover > them after a JobManager failure. > > On Fri, Nov 4, 2016 at 2:27 PM, Josh <jof...@gmail.com> wrote: > > I have a follow up question to this - if I'm running a job in > 'yarn-cluster' > > mode with HA and then at some point the YARN application fails due to > some > > hardware failure (i.e. the YARN application moves to "FINISHED"/"FAILED" > > state), how can I restore the job from the most recent checkpoint? > > > > I can use `flink run -m yarn-cluster -s s3://my-savepoints/id .....` to > > restore from a savepoint, but what if I haven't manually taken a > savepoint > > recently? > > > > Thanks, > > Josh > > > > On Fri, Nov 4, 2016 at 10:06 AM, Maximilian Michels <m...@apache.org> > wrote: > >> > >> Hi Anchit, > >> > >> The documentation mentions that you need Zookeeper in addition to > >> setting the application attempts. Zookeeper is needed to retrieve the > >> current leader for the client and to filter out old leaders in case > >> multiple exist (old processes could even stay alive in Yarn). Moreover, > it > >> is needed to persist the state of the application. > >> > >> > >> -Max > >> > >> > >> On Thu, Nov 3, 2016 at 7:43 PM, Anchit Jatana > >> <development.anc...@gmail.com> wrote: > >> > Hi Maximilian, > >> > > >> > Thanks for you response. Since, I'm running the application on YARN > >> > cluster > >> > using 'yarn-cluster' mode i.e. using 'flink run -m yarn-cluster ..' > >> > command. > >> > Is there anything more that I need to configure apart from setting up > >> > 'yarn.application-attempts: 10' property inside conf/flink-conf.yaml. > >> > > >> > Just wished to confirm if there is anything more that I need to > >> > configure to > >> > set up HA on 'yarn-cluster' mode. > >> > > >> > Thank you > >> > > >> > Regards, > >> > Anchit > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Flink-Application-on-YARN-failed-on-losing-Job-Manager-No- > recovery-Need-help-debug-the-cause-from-los-tp9839p9887.html > >> > Sent from the Apache Flink User Mailing List archive. mailing list > >> > archive at Nabble.com. > > > > >