Hello all,

We are trying to run a Flink job in standalone mode using the official
docker image on k8s. As per this documentation
<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#advanced-customization>
we
have created our custom docker image that extends from the official image
and does some pre start actions. And finally does `exec
/docker-entrypoint.sh standalone-job "$1"` to run the job manager. We have
ensured that flink-conf.yaml is present at expected path
i.e. $FLINK_HOME"/conf/flink-conf.yaml and have setup
JOB_MANAGER_RPC_ADDRESS from pod IP.

We submit our job for execution in application's main thread using
`StreamExecutionEnvironment#executeAsync`. But while submitting the job we
are consistently getting AskTimeout exception from dispatcher#SubmitJob. (
see logs below )

Based on some previous answers on mailing lists and issues, we tried
increasing "web.timeout" and "akka.ask.timeout" but neither of that helped.
It seems like the timeout value used for this particular future is
hardcoded in code. somewhere. Would be great if someone can  provide some
help / pointers on what we are missing or things that we should check for.

Error logs:










*Caused by: java.util.concurrent.TimeoutException: Invocation of public
abstract java.util.concurrent.CompletableFuture
org.apache.flink.runtime.dispatcher.DispatcherGateway.submitJob(org.apache.flink.runtime.jobgraph.JobGraph,org.apache.flink.api.common.time.Time)
timed out. at org.apache.flink.runtime.rpc.akka.$Proxy31.submitJob(Unknown
Source) ~[?:1.13.2] at
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitJob$6(EmbeddedExecutor.java:183)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source)
~[?:?] at java.util.concurrent.CompletableFuture.postComplete(Unknown
Source) ~[?:?] at java.util.concurrent.CompletableFuture.complete(Unknown
Source) ~[?:?] at
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
~[?:?] at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source) ~[?*.
.
.
.
.










*Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/rpc/dispatcher_1#2019478781]] after [60000 ms].
Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage].
A typical reason for `AskTimeoutException` is that the recipient actor
didn't send a reply. at
akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
~[flink-dist_2.12-1.13.2.jar:1.13.2] at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
~[flink.jar:?] at
akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279)
~[flink.jar:?] at
akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283)
~[flink.jar:?]*


-
Dhanesh Arole

Reply via email to