We are also experiencing this! Thanks for speaking up! It's relieving to
know we're not alone :)

We tried adding `akka.ask.timeout: 1 min` to our `flink-conf.yaml`, which
did not seem to have any effect. I tried adding every other related akka,
rpc, etc. timeout and still continue to encounter these errors. I believe
they may also impact our ability to deploy (as we get a timeout when
submitting the job programmatically). I'd love to see a solution to this if
one exists!

Best,

Aaron Levin

On Thu, Jan 10, 2019 at 2:58 PM Steven Wu <stevenz...@gmail.com> wrote:

> We are trying out Flink 1.7.0. We always get this exception when
> submitting a job with external checkpoint via REST. Job parallelism is
> 1,600. state size is probably in the range of 1-5 TBs. Job is actually
> started. Just REST api returns this failure.
>
> If we submitting the job without external checkpoint, everything works
> fine.
>
> Anyone else see such problem with 1.7? Appreciate your help!
>
> Thanks,
> Steven
>
> org.apache.flink.runtime.rest.handler.RestHandlerException:
> akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
> Sender[null] sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$4(JarRunHandler.java:114)
>         at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
>         at akka.dispatch.OnComplete.internal(Future.scala:258)
>         at akka.dispatch.OnComplete.internal(Future.scala:256)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>         at
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>         at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>         at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>         at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException:
> akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
> Sender[null] sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>         at
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>         at
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>         at
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
>         ... 21 more
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
> Sender[null] sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>         ... 9 more
>

Reply via email to