I was trying to cancel a job with savepoint, but the CLI command failed
with "akka.pattern.AskTimeoutException: Ask timed out".

The stack trace reveals that ask timeout is 10 seconds:

Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/jobmanager_0#106635280]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".

Indeed it's documented that the default value for akka.ask.timeout="10 s" in
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#distributed-coordination-via-akka

Behind the scenes the savepoint creation & job cancellation succeeded, that
was to be expected, kind of. So my problem is just getting a proper
response back from the CLI call instead of timing out so eagerly.

To be exact, what I ran was:

flink-1.5.2/bin/flink cancel b7c7d19d25e16a952d3afa32841024e5 -m
yarn-cluster -yid application_1533676784032_0001 --withSavepoint

Should I change the akka.ask.timeout to have a longer timeout? If yes, can
I override it just for the CLI call somehow? Maybe it might have undesired
side-effects if set globally for the actual flink jobs to use?

What about akka.client.timeout? The default for it is also rather low: "60
s". Should it also be increased accordingly if I want to accept longer than
60 s for savepoint creation?

Finally, that default timeout is so low that I would expect this to be a
common problem. I would say that Flink CLI should have higher default
timeout for cancel and savepoint creation ops.

Thanks!

Reply via email to