Hi Robert, Yes there is an OversizedPayloadException in the job manager log: --------------- 2017-05-22 15:39:18,942 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Upload jar files to job manager akka.tcp://flink@localhost:6123/user/jobmanager. 2017-05-22 15:39:18,957 INFO org.apache.flink.runtime.blob.BlobClient - Blob client connecting to akka.tcp://flink@localhost:6123/user/jobmanager 2017-05-22 15:39:19,451 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Submit job to the job manager akka.tcp://flink@localhost:6123/user/jobmanager. 2017-05-22 15:39:19,632 ERROR akka.remote.EndpointWriter - Transient association error (association remains live) akka.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[akka.tcp://flink@localhost:6123/user/jobmanager#1736150911]: max allowed size 10485760 bytes, actual size of encoded class org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage was 24555177 bytes. 2017-05-22 15:41:19,472 ERROR org.apache.flink.client.CliFrontend - Error while running the command. org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Couldn't retrieve the JobExecutionResult from the JobManager. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at fwdnxt.Sonar.main(Sonar.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116) Caused by: org.apache.flink.runtime.client.JobExecutionException: Couldn't retrieve the JobExecutionResult from the JobManager. at org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:294) at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:382) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423) ... 22 more Caused by: org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job submission to the JobManager timed out. You may increase 'akka.client.timeout' in case the JobManager needs more time to configure and confirm the job submission. ------------- Does this help in explaining why it fails only in the command line client? How can I fix it, what is "JobManagerMessages$LeaderSessionMessage was 24555177 bytes”?
Regards, -Rami On 12 May 2017, at 19:21, Robert Metzger <rmetz...@apache.org<mailto:rmetz...@apache.org>> wrote: Hi, did you check the jobmanager log for any incoming messages? I would be interesting to see if the JM failed after the initial akka message, or if there's any kind of hiccup ? On Thu, May 11, 2017 at 5:07 PM, Rami Al-Isawi <rami.al-is...@comptel.com<mailto:rami.al-is...@comptel.com>> wrote: Hi, The same exact jar on the same machine is being deployed just fine in couple of seconds using the web interface. On the other hand, if I used the command line, I get: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Couldn't retrieve the JobExecutionResult from the JobManager. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) … Caused by: org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job submission to the JobManager timed out. You may increase 'akka.client.timeout' in case the JobManager needs more time to configure and confirm the job submission. at org.apache.flink.runtime.client.JobSubmissionClientActor.handleCustomMessage(JobSubmissionClientActor.java:119) at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:239) at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88) at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) I did increase the timeout, but it fails the same way. I assume that submission method should not be relevant, so what is the difference between command-line and web submission? I tested with taking out some subtasks and that made the command-line successfully submit the job, but then how come it worked fine using the web interface with all the subtasks included? Regards, -Rami Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you. Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.