Hi Juho,

This problem does exist, I suggest you separate these two steps to
temporarily deal with this problem:
1) Trigger Savepoint separately;
2) execute the cancel command;

Hi Till, Chesnay:

Our internal environment and multiple users on the mailing list have
encountered similar problems.

In our environment, it seems that JM shows that the save point is complete
and JM has stopped itself, but the client will still connect to the old JM
and report a timeout exception.

Thanks, vino.


Juho Autio <juho.au...@rovio.com> 于2018年8月8日周三 下午9:18写道:

> I was trying to cancel a job with savepoint, but the CLI command failed
> with "akka.pattern.AskTimeoutException: Ask timed out".
>
> The stack trace reveals that ask timeout is 10 seconds:
>
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/jobmanager_0#106635280]] after [10000 ms].
> Sender[null] sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>
> Indeed it's documented that the default value for akka.ask.timeout="10
> s" in
>
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#distributed-coordination-via-akka
>
> Behind the scenes the savepoint creation & job cancellation succeeded,
> that was to be expected, kind of. So my problem is just getting a proper
> response back from the CLI call instead of timing out so eagerly.
>
> To be exact, what I ran was:
>
> flink-1.5.2/bin/flink cancel b7c7d19d25e16a952d3afa32841024e5 -m
> yarn-cluster -yid application_1533676784032_0001 --withSavepoint
>
> Should I change the akka.ask.timeout to have a longer timeout? If yes, can
> I override it just for the CLI call somehow? Maybe it might have undesired
> side-effects if set globally for the actual flink jobs to use?
>
> What about akka.client.timeout? The default for it is also rather low: "60
> s". Should it also be increased accordingly if I want to accept longer than
> 60 s for savepoint creation?
>
> Finally, that default timeout is so low that I would expect this to be a
> common problem. I would say that Flink CLI should have higher default
> timeout for cancel and savepoint creation ops.
>
> Thanks!
>

Reply via email to