Hi Greg, Can you describe the steps to reproduce the problem, or can you attach the full jobmanager logs? Because JobExecutionResultHandler appears in your log, I assume that you are starting a job cluster on YARN. Without seeing the complete logs, I cannot be sure what exactly happens. For now, you can try setting the config option web.timeout to a higher value.
Best, Gary On Fri, Aug 31, 2018 at 8:01 PM, Greg Finch <finchgreg...@gmail.com> wrote: > I'm having a problem with akka timeout when starting my cluster. The > error is "Ask timed out after 10000 ms.". I have changed the > akka.ask.timeout config setting to be 300000 ms, but it still times out and > fails after 10 seconds. I confirmed that the config is properly set by > both checking the Job Manager configuration tab (it shows 300000 ms) as > well logging the output of AkkaUtils.getTimeout(configuration) which also > shows 300000ms. It seems something is not honoring that configuration > value. > > I did find a different thread that discussed the fact that the > LocalStreamEnvironment will not honor this setting, but that is not my > case. I am running on a cluster (AWS EMR) using the regular > StreamExecutionEnvironment. This is Flink 1.5.2. > > Any ideas? > > ~~~~~ > > 2018-08-31 17:37:55 INFO > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token > for : ip-10-213-139-66.ec2.internal:8041 > 2018-08-31 17:37:55 INFO > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token > for : ip-10-213-136-25.ec2.internal:8041 > 2018-08-31 17:38:34 ERROR > o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler - > Implementation error: Unhandled exception. > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/dispatcher#-219618710]] after [10000 ms]. > Sender[null] sent message of type > "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) > at java.lang.Thread.run(Thread.java:748) > 2018-08-31 17:38:41 INFO > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Waiting for > application to be successfully unregistered. > 2018-08-31 17:38:41 INFO > o.a.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Interrupted > while waiting for queue > java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > 2018-08-31 17:38:42 WARN akka.remote.ReliableDeliverySupervisor > flink-akka.remote.default-remote-dispatcher-81 - Association with remote > system [akka.tcp://flink@ip-10-213-142-102.ec2.internal:42027] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > > >