I understand this. What I am not sure about is the sequence: It will come back and restore from ZK/HDFS And then it will try to start a job specified in the class, right? Which will overwrite (potentially) the restored job. How does it know not to start the job defined in the class, once the previous one was restored?
Boris Lublinsky FDP Architect boris.lublin...@lightbend.com https://www.lightbend.com/ > On May 25, 2019, at 2:45 AM, bastien dine <bastien.d...@gmail.com> wrote: > > Hello Boris, > > I think you are confused by the name of the shell script "standalone-job.sh" > Which basically means that we start a "standalone job manager" as stated in > the first comment of > https://github.com/apache/flink/blob/release-1.8/flink-dist/src/main/flink-bin/bin/standalone-job.sh > > <https://github.com/apache/flink/blob/release-1.8/flink-dist/src/main/flink-bin/bin/standalone-job.sh> > > This is another version of : flink-dist/src/main/flink-bin/bin/jobmanager.sh > > It's not related to a job > > When you configure H-A on a flink cluster, and you submit a job, Flink (i.e > the jobmanager) store the state of the job in Zookeeper / HDFS > So when it crashes and comes back (with this entrypoint) it will read in ZK / > HDFS and restore previous execution > > Regards, > Bastien > > ------------------ > > Bastien DINE > Data Architect / Software Engineer / Sysadmin > bastiendine.io <http://bastiendine.io/> > > > Le ven. 24 mai 2019 à 23:22, Boris Lublinsky <boris.lublin...@lightbend.com > <mailto:boris.lublin...@lightbend.com>> a écrit : > Hi, > I was experimenting with HA lately and see that it recovers successfully job, > in the case of jobmanager restarts. > Now my question is whether it will work for the job cluster. > Based on the instructions > https://github.com/apache/flink/blob/release-1.8/flink-container/docker/README.md > > <https://github.com/apache/flink/blob/release-1.8/flink-container/docker/README.md> > I can see > https://github.com/apache/flink/blob/release-1.8/flink-container/docker/docker-entrypoint.sh > > <https://github.com/apache/flink/blob/release-1.8/flink-container/docker/docker-entrypoint.sh> > that > In this case the following command is invoked: > exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@“ > > Which means that if a jobManager restarts, the following is going to happen: > > 1. It will use HA to restore job that was running > 2. A new job will be submitted, overwriting restored job and bypassing > checkpoint restore. > > Am I missing something here? > > > Boris Lublinsky > FDP Architect > boris.lublin...@lightbend.com <mailto:boris.lublin...@lightbend.com> > https://www.lightbend.com/ <https://www.lightbend.com/>