So here are my questions:
1. What environment do you run Flink in? Is it locally, on Yarn or Mesos?
2. How do you trigger "restart a Job Master"?

Thank you~

Xintong Song



On Tue, Jun 4, 2019 at 10:35 AM Boris Lublinsky <
boris.lublin...@lightbend.com> wrote:

> Thanks,
> Thats what I thought initially.
> The issue is that because of this, during restart, it does not know which
> job was running before (it is obtained from submitted job graph store).
> Because this is empty, there is no restarted jobs and the cluster does not
> even try to restore checkpoints.
> I can see that checkpoints are stored correctly, but they are never
> accessed.
>
> Boris Lublinsky
> FDP Architect
> boris.lublin...@lightbend.com
> https://www.lightbend.com/
>
> On Jun 3, 2019, at 9:23 PM, Xintong Song <tonysong...@gmail.com> wrote:
>
> Hi Boris,
>
> I think what you described that putJobGraph is not invoked in Flink job
> cluster is by design and should not cause a failure of job recovering. For
> a Flink job cluster, there is only one job graph to execute. Instead of
> uploading job graph to an already running cluster (like in a session
> cluster), the job graph in a Flink job cluster is uploaded before the
> cluster is started, together with the Flink framework jars. Please refer to
> MiniDispatcher and SingleJobSubmittedJobGraphStore for the details.
>
> I think we need more information to find the root cause of your problem.
> For example, can you explain what are the detailed operation steps do you
> perform when you say "trying to restart a Job Master".
>
> Thank you~
> Xintong Song
>
>
>
> On Mon, Jun 3, 2019 at 10:05 PM Boris Lublinsky <
> boris.lublin...@lightbend.com> wrote:
>
>> I am trying to experiment with Flink Job server with HA and I am
>> noticing, that in this case
>> method putJobGraph in the class SubmittedJobGraphStore Is never invoked.
>> (I can see that it is invoked in the case of session cluster when a job is
>> added)
>> As a result, when I am trying to restart a Job Master, it finds no
>> running jobs and is not trying to restore it.
>> Am I missing something?
>>
>>
>>
>> Boris Lublinsky
>> FDP Architect
>> boris.lublin...@lightbend.com
>> https://www.lightbend.com/
>>
>>
>

Reply via email to