[ https://issues.apache.org/jira/browse/FLINK-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055566#comment-17055566 ]
Zili Chen edited comment on FLINK-16018 at 3/10/20, 4:24 AM: ------------------------------------------------------------- A rough thought is we respect {{timeout}} parameter in {{Dispatcher#submitJob}}, having a field that helps determine the progress, and complete the future on Timeout with that field(stringified in {{JobSubmissionException}}). was (Author: tison): A general thought is we respect {{timeout}} parameter in {{Dispatcher#submitJob}}, having a field that helps determine the progress, and complete the future on Timeout with that field(stringified in {{JobSubmissionException}}). > Improve error reporting when submitting batch job (instead of > AskTimeoutException) > ---------------------------------------------------------------------------------- > > Key: FLINK-16018 > URL: https://issues.apache.org/jira/browse/FLINK-16018 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.9.2, 1.10.0 > Reporter: Robert Metzger > Priority: Blocker > Fix For: 1.10.1, 1.11.0 > > > While debugging the {{Shaded Hadoop S3A end-to-end test (minio)}} pre-commit > test, I noticed that the JobSubmission is not producing very helpful error > messages. > Environment: > - A simple batch wordcount job > - a unavailable minio s3 filesystem service > What happens from a user's perspective: > - The job submission fails after 10 seconds with a AskTimeoutException: > {code} > 2020-02-07T11:38:27.1189393Z akka.pattern.AskTimeoutException: Ask timed out > on [Actor[akka://flink/user/dispatcher#-939201095]] after [10000 ms]. Message > of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical > reason for `AskTimeoutException` is that the recipient actor didn't send a > reply. > 2020-02-07T11:38:27.1189538Z at > akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) > 2020-02-07T11:38:27.1189616Z at > akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) > 2020-02-07T11:38:27.1189713Z at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648) > 2020-02-07T11:38:27.1189789Z at > akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) > 2020-02-07T11:38:27.1189883Z at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > 2020-02-07T11:38:27.1189973Z at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > 2020-02-07T11:38:27.1190067Z at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > 2020-02-07T11:38:27.1190159Z at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328) > 2020-02-07T11:38:27.1190267Z at > akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279) > 2020-02-07T11:38:27.1190358Z at > akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283) > 2020-02-07T11:38:27.1190465Z at > akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235) > 2020-02-07T11:38:27.1190540Z at java.lang.Thread.run(Thread.java:748) > {code} > What a user would expect: > - An error message indicating why the job submission failed. -- This message was sent by Atlassian Jira (v8.3.4#803005)