I'm having a problem with akka timeout when starting my cluster.  The error
is "Ask timed out after 10000 ms.".  I have changed the akka.ask.timeout
config setting to be 300000 ms, but it still times out and fails after 10
seconds.  I confirmed that the config is properly set by both checking the
Job Manager configuration tab (it shows 300000 ms) as well logging the
output of AkkaUtils.getTimeout(configuration) which also shows 300000ms.
It seems something is not honoring that configuration value.

I did find a different thread that discussed the fact that the
LocalStreamEnvironment will not honor this setting, but that is not my
case.  I am running on a cluster (AWS EMR) using the regular
StreamExecutionEnvironment.  This is Flink 1.5.2.

Any ideas?

~~~~~

2018-08-31 17:37:55 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new
token for : ip-10-213-139-66.ec2.internal:8041
2018-08-31 17:37:55 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new
token for : ip-10-213-136-25.ec2.internal:8041
2018-08-31 17:38:34 ERROR
o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  -
Implementation error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#-219618710]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
2018-08-31 17:38:41 INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Waiting for
application to be successfully unregistered.
2018-08-31 17:38:41 INFO
o.a.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl  -
Interrupted while waiting for queue
java.lang.InterruptedException: null
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323)
2018-08-31 17:38:42 WARN  akka.remote.ReliableDeliverySupervisor
flink-akka.remote.default-remote-dispatcher-81 - Association with
remote system [akka.tcp://flink@ip-10-213-142-102.ec2.internal:42027]
has failed, address is now gated for [50] ms. Reason: [Disassociated]

Reply via email to