These logs prove that it is indeed a timeout issue, In our scenario, it was due to the task deploy took a lot of time. You can check if the time from Task from SCHEDULED to DEPLOYING in the log is greater than 10s. This step are processed in mainThread and will block the processing of requests from the UI.
By now, you can increase the ‘akka.ask.timeout’ to avoid this. I have created a jira issue to improve this. https://issues.apache.org/jira/browse/FLINK-16069 <https://issues.apache.org/jira/browse/FLINK-16069> . Best Weihua Hu > 2020年2月15日 01:54,Richard Moorhead <richard.moorh...@gmail.com> 写道: > > 2020-02-14 11:50:35,402 ERROR > org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler - Unhandled > exception. > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/dispatcher#1293527273]] after [10000 ms]. Message of > type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical > reason for `AskTimeoutException` is that the recipient actor didn't send a > reply. > at > akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635) > at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650) > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870) > at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109) > at > scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328) > at > akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279) > at > akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283) > at > akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:235) > at java.lang.Thread.run(Thread.java:748) > > > > On Wed, Feb 12, 2020 at 11:30 PM HuWeihua <huweihua....@gmail.com > <mailto:huweihua....@gmail.com>> wrote: > Hi, Richard > > This is most likely that the Rest Api has timed out, you can try to find some > evidence in the jobmanager log. > > You can provide the full log to help us find the root cause. > > > Best > Weihua Hu > >> 2020年2月13日 09:40,Richard Moorhead <richard.moorh...@gmail.com >> <mailto:richard.moorh...@gmail.com>> 写道: >> >> When I submit a job to flink session with parallelism higher than 128, the >> job is submitted and renders in the UI but when I view the job itself the UI >> starts to rapidly emit errors in the upper right: >> >> Server Response: >> Unable to load requested file /bad-request. >> >> Is this a known issue? Is there a fix? Does this indicate underlying >> stability issues? >