So here are my questions: 1. What environment do you run Flink in? Is it locally, on Yarn or Mesos? 2. How do you trigger "restart a Job Master"?
Thank you~ Xintong Song On Tue, Jun 4, 2019 at 10:35 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Thanks, > Thats what I thought initially. > The issue is that because of this, during restart, it does not know which > job was running before (it is obtained from submitted job graph store). > Because this is empty, there is no restarted jobs and the cluster does not > even try to restore checkpoints. > I can see that checkpoints are stored correctly, but they are never > accessed. > > Boris Lublinsky > FDP Architect > boris.lublin...@lightbend.com > https://www.lightbend.com/ > > On Jun 3, 2019, at 9:23 PM, Xintong Song <tonysong...@gmail.com> wrote: > > Hi Boris, > > I think what you described that putJobGraph is not invoked in Flink job > cluster is by design and should not cause a failure of job recovering. For > a Flink job cluster, there is only one job graph to execute. Instead of > uploading job graph to an already running cluster (like in a session > cluster), the job graph in a Flink job cluster is uploaded before the > cluster is started, together with the Flink framework jars. Please refer to > MiniDispatcher and SingleJobSubmittedJobGraphStore for the details. > > I think we need more information to find the root cause of your problem. > For example, can you explain what are the detailed operation steps do you > perform when you say "trying to restart a Job Master". > > Thank you~ > Xintong Song > > > > On Mon, Jun 3, 2019 at 10:05 PM Boris Lublinsky < > boris.lublin...@lightbend.com> wrote: > >> I am trying to experiment with Flink Job server with HA and I am >> noticing, that in this case >> method putJobGraph in the class SubmittedJobGraphStore Is never invoked. >> (I can see that it is invoked in the case of session cluster when a job is >> added) >> As a result, when I am trying to restart a Job Master, it finds no >> running jobs and is not trying to restore it. >> Am I missing something? >> >> >> >> Boris Lublinsky >> FDP Architect >> boris.lublin...@lightbend.com >> https://www.lightbend.com/ >> >> >