Hi satya, Maybe the job has been submitted to Dispatcher successfully and the internal submitting job takes too long time(more than 10s). So it failed with timeout. Could you please set the `web.timeout: 30000` and run again?
Best, Yang satya brat <bratsatya...@gmail.com> 于2020年1月20日周一 下午4:34写道: > I am using standalone cluster of Flink with 1 jobManager and n > taskManagers. When I try to submit a job via command line, the job > submission fails with error message as > org.apache.flink.client.program.ProgramInvocationException: Could not > submit job (JobID: f839aefee74aa4483ce8f8fd2e49b69e). > > On jobManager instance, everything works fine till the job is switched > from DEPLOYING to RUNNING. Post that, once akka-timeut expires, I see the > following stacktrace > > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. > Sender[null] sent message of type > "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) > at java.lang.Thread.run(Thread.java:745) > > I went through the flink code on github and all the steps required to > execute a job seems to be running fine. However, when jobManager has to > give job submission ack to flink client that triggered the job, the > jobSubmitHandler times out on the akka dispatcher that according to my > understanding takes care of communicating with the job client. > > The Flink job consists for 1 Source (kafka), 2 operators and 1 sink(Custom > Sink). Following link shows the jobManager logs: > https://pastebin.com/raw/3GaTtNrG > > Once the dispatcher times out, all other Flink UI calls also timeout with > same exception. > > Following are the flink client logs that is used to submit job via command > line. > > 2019-09-28 19:34:21,321 INFO org.apache.flink.client.cli.CliFrontend > - > -------------------------------------------------------------------------------- > 2019-09-28 19:34:21,322 INFO org.apache.flink.client.cli.CliFrontend > - Starting Command Line Client (Version: 1.8.0, Rev:<unknown>, > Date:<unknown>) > 2019-09-28 19:34:21,322 INFO org.apache.flink.client.cli.CliFrontend > - OS current user: root > 2019-09-28 19:34:21,322 INFO org.apache.flink.client.cli.CliFrontend > - Current Hadoop/Kerberos user: <no hadoop dependency found> > 2019-09-28 19:34:21,322 INFO org.apache.flink.client.cli.CliFrontend > - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation > - 1.8/25.5-b02 > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - Maximum heap size: 2677 MiBytes > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - JAVA_HOME: (not set) > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - No Hadoop Dependency available > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - JVM Options: > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - > -Dlog.file=/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/log/flink-root-client-fulfillment-stream-processor-flink-task-manager-2-8047357.log > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - > -Dlog4j.configuration=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/log4j-cli.properties > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - > -Dlogback.configurationFile=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/logback.xml > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - Program Arguments: > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - run > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - -d > 2019-09-28 19:34:21,323 INFO org.apache.flink.client.cli.CliFrontend > - -c > 2019-09-28 19:34:21,324 INFO org.apache.flink.client.cli.CliFrontend > - /home/fse/flink-kafka-relayer-0.2.jar > 2019-09-28 19:34:21,324 INFO org.apache.flink.client.cli.CliFrontend > - Classpath: > /var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/log4j-1.2.17.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/slf4j-log4j12-1.7.15.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/flink-dist_2.11-1.8.0.jar::: > 2019-09-28 19:34:21,324 INFO org.apache.flink.client.cli.CliFrontend > - > -------------------------------------------------------------------------------- > 2019-09-28 19:34:21,328 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.address, <job-manager-ip> > 2019-09-28 19:34:21,328 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 6123 > 2019-09-28 19:34:21,328 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.heap.size, 1024m > 2019-09-28 19:34:21,329 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.heap.size, 1024m > 2019-09-28 19:34:21,329 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numberOfTaskSlots, 4 > 2019-09-28 19:34:21,329 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: parallelism.default, 1 > 2019-09-28 19:34:21,329 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.jmx.class, > org.apache.flink.metrics.jmx.JMXReporter > 2019-09-28 19:34:21,329 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.jmx.port, 8789 > 2019-09-28 19:34:21,333 WARN org.apache.flink.client.cli.CliFrontend > - Could not load CLI class > org.apache.flink.yarn.cli.FlinkYarnSessionCli. > java.lang.NoClassDefFoundError: > org/apache/hadoop/yarn/exceptions/YarnException > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:259) > at > org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1230) > at > org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1190) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1115) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.yarn.exceptions.YarnException > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 5 more > 2019-09-28 19:34:21,343 INFO org.apache.flink.core.fs.FileSystem > - Hadoop is not in the classpath/dependencies. The extended set > of supported File Systems via Hadoop is not available. > 2019-09-28 19:34:21,545 INFO > org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot > create Hadoop Security Module because Hadoop cannot be found in the Classpath. > 2019-09-28 19:34:21,560 INFO org.apache.flink.runtime.security.SecurityUtils > - Cannot install HadoopSecurityContext because Hadoop cannot be > found in the Classpath. > 2019-09-28 19:34:21,561 INFO org.apache.flink.client.cli.CliFrontend > - Running 'run' command. > 2019-09-28 19:34:21,566 INFO org.apache.flink.client.cli.CliFrontend > - Building program from JAR file > 2019-09-28 19:34:21,744 INFO org.apache.flink.configuration.Configuration > - Config uses fallback configuration key > 'jobmanager.rpc.address' instead of key 'rest.address' > 2019-09-28 19:34:21,896 INFO org.apache.flink.runtime.rest.RestClient > - Rest client endpoint started. > 2019-09-28 19:34:21,898 INFO org.apache.flink.client.cli.CliFrontend > - Starting execution of program > 2019-09-28 19:34:21,898 INFO > org.apache.flink.client.program.rest.RestClusterClient - Starting > program in interactive mode (detached: true) > 2019-09-28 19:34:22,594 WARN > org.apache.flink.streaming.api.environment.StreamContextEnvironment - Job > was executed in detached mode, the results will be available on completion. > 2019-09-28 19:34:22,632 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.address, <job-manager-ip> > 2019-09-28 19:34:22,632 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 6123 > 2019-09-28 19:34:22,632 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.heap.size, 1024m > 2019-09-28 19:34:22,633 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.heap.size, 1024m > 2019-09-28 19:34:22,633 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numberOfTaskSlots, 4 > 2019-09-28 19:34:22,633 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: parallelism.default, 1 > 2019-09-28 19:34:22,633 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.jmx.class, > org.apache.flink.metrics.jmx.JMXReporter > 2019-09-28 19:34:22,633 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: metrics.reporter.jmx.port, 8789 > 2019-09-28 19:34:22,635 INFO > org.apache.flink.client.program.rest.RestClusterClient - Submitting > job f839aefee74aa4483ce8f8fd2e49b69e (detached: true). > 2019-09-28 19:36:04,341 INFO org.apache.flink.runtime.rest.RestClient > - Shutting down rest endpoint. > 2019-09-28 19:36:04,343 INFO org.apache.flink.runtime.rest.RestClient > - Rest endpoint shutdown complete. > 2019-09-28 19:36:04,343 ERROR org.apache.flink.client.cli.CliFrontend > - Error while running the command. > org.apache.flink.client.program.ProgramInvocationException: Could not submit > job (JobID: f839aefee74aa4483ce8f8fd2e49b69e) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:250) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483) > at > org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:429) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.client.cli.CliFrontend$$Lambda$5/1971851377.call(Unknown > Source) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to > submit JobGraph. > at > org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:388) > at > org.apache.flink.client.program.rest.RestClusterClient$$Lambda$17/788892554.apply(Unknown > Source) > at > java.util.concurrent.CompletableFuture$ExceptionCompletion.run(CompletableFuture.java:1246) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) > at > java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) > at > java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) > at > java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) > at > java.util.concurrent.CompletableFuture$ThenCopy.run(CompletableFuture.java:1333) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:207) > at > org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$34/1092254958.accept(Unknown > Source) > at > java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) > at > java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) > at > java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:626) > at > java.util.concurrent.CompletableFuture$Async.run(CompletableFuture.java:428) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal > server error., <Exception on server side: > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. > Sender[null] sent message of type > "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) > at > akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) > at java.lang.Thread.run(Thread.java:745) > > End of exception on server side>] > at > org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:389) > at > org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:373) > at > org.apache.flink.runtime.rest.RestClient$$Lambda$33/1155836850.apply(Unknown > Source) > at > java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:604) > ... 4 more > > I have turned on debug logs for flink, akka and kafka but not able to > figure out what is going wrong. I have very basic understanding of akka > because of which not able to figure out what is going wrong. Can someone > help me with that?? I am running flink 1.8.0. >