Hi Rami, I think the problem is that when submitting your job through the web interface, Akka will not use remoting (= it will not send your message to a remote machine). When you submit your message from the client, it'll go through akka's network stack (=remoting). Akka rejects messages above a certain size. It looks like your job exceeds that size.
Here, you'll find the config keys to increase the size: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#distributed-coordination-via-akka On Mon, May 22, 2017 at 2:56 PM, Rami Al-Isawi <rami.al-is...@comptel.com> wrote: > Hi Robert, > > Yes there is an OversizedPayloadException in the job manager log: > --------------- > 2017-05-22 15:39:18,942 INFO > org.apache.flink.runtime.client.JobSubmissionClientActor > - Upload jar files to job manager akka.tcp://flink@localhost: > 6123/user/jobmanager. > 2017-05-22 15:39:18,957 INFO org.apache.flink.runtime.blob.BlobClient > - Blob client connecting to akka.tcp://flink@localhost: > 6123/user/jobmanager > 2017-05-22 15:39:19,451 INFO > org.apache.flink.runtime.client.JobSubmissionClientActor > - Submit job to the job manager akka.tcp://flink@localhost: > 6123/user/jobmanager. > 2017-05-22 15:39:19,632 ERROR akka.remote.EndpointWriter > - Transient association error (association remains live) > akka.remote.OversizedPayloadException: Discarding oversized payload sent > to Actor[akka.tcp://flink@localhost:6123/user/jobmanager#1736150911]: max > allowed size 10485760 bytes, actual size of encoded class > org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage > was 24555177 bytes. > 2017-05-22 15:41:19,472 ERROR org.apache.flink.client.CliFrontend > - Error while running the command. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Couldn't retrieve the JobExecutionResult from the > JobManager. > at org.apache.flink.client.program.ClusterClient.run( > ClusterClient.java:427) > at org.apache.flink.client.program.StandaloneClusterClient.submitJob( > StandaloneClusterClient.java:101) > at org.apache.flink.client.program.ClusterClient.run( > ClusterClient.java:400) > at org.apache.flink.streaming.api.environment.StreamContextEnvironment. > execute(StreamContextEnvironment.java:66) > at fwdnxt.Sonar.main(Sonar.java:162) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.flink.client.program.PackagedProgram.callMainMethod( > PackagedProgram.java:528) > at org.apache.flink.client.program.PackagedProgram. > invokeInteractiveModeForExecution(PackagedProgram.java:419) > at org.apache.flink.client.program.ClusterClient.run( > ClusterClient.java:339) > at org.apache.flink.client.CliFrontend.executeProgram( > CliFrontend.java:831) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256) > at org.apache.flink.client.CliFrontend.parseParameters( > CliFrontend.java:1073) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117) > at org.apache.flink.runtime.security.HadoopSecurityContext$1.run( > HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1548) > at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured( > HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116) > Caused by: org.apache.flink.runtime.client.JobExecutionException: > Couldn't retrieve the JobExecutionResult from the JobManager. > at org.apache.flink.runtime.client.JobClient. > awaitJobResult(JobClient.java:294) > at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient. > java:382) > at org.apache.flink.client.program.ClusterClient.run( > ClusterClient.java:423) > ... 22 more > Caused by: > org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: > Job submission to the JobManager timed out. You may increase > 'akka.client.timeout' in case the JobManager needs more time to configure > and confirm the job submission. > ------------- > Does this help in explaining why it fails only in the command line client? > How can I fix it, what is "JobManagerMessages$LeaderSessionMessage was > 24555177 bytes”? > > > Regards, > -Rami > > On 12 May 2017, at 19:21, Robert Metzger <rmetz...@apache.org> wrote: > > Hi, > did you check the jobmanager log for any incoming messages? I would be > interesting to see if the JM failed after the initial akka message, or if > there's any kind of hiccup ? > > On Thu, May 11, 2017 at 5:07 PM, Rami Al-Isawi <rami.al-is...@comptel.com> > wrote: > >> Hi, >> >> The same exact jar on the same machine is being deployed just fine in >> couple of seconds using the web interface. On the other hand, if I used the >> command line, I get: >> >> org.apache.flink.client.program.ProgramInvocationException: The program >> execution failed: Couldn't retrieve the JobExecutionResult from the >> JobManager. >> at org.apache.flink.client.program.ClusterClient.run(ClusterCli >> ent.java:427) >> at org.apache.flink.client.program.StandaloneClusterClient.subm >> itJob(StandaloneClusterClient.java:101) >> at org.apache.flink.client.program.ClusterClient.run(ClusterCli >> ent.java:400) >> at org.apache.flink.streaming.api.environment.StreamContextEnvi >> ronment.execute(StreamContextEnvironment.java:66) >> … >> Caused by: >> org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: >> Job submission to the JobManager timed out. You may increase >> 'akka.client.timeout' in case the JobManager needs more time to configure >> and confirm the job submission. >> at org.apache.flink.runtime.client.JobSubmissionClientActor.han >> dleCustomMessage(JobSubmissionClientActor.java:119) >> at org.apache.flink.runtime.client.JobClientActor.handleMessage >> (JobClientActor.java:239) >> at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeader >> SessionID(FlinkUntypedActor.java:88) >> at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(Fl >> inkUntypedActor.java:68) >> at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(Untyp >> edActor.scala:167) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:467) >> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) >> at akka.dispatch.Mailbox.run(Mailbox.scala:220) >> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask. >> exec(AbstractDispatcher.scala:397) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java:1339) >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >> l.java:1979) >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >> orkerThread.java:107) >> >> >> I did increase the timeout, but it fails the same way. >> >> I assume that submission method should not be relevant, so what is the >> difference between command-line and web submission? >> >> I tested with taking out some subtasks and that made the command-line >> successfully submit the job, but then how come it worked fine using the web >> interface with all the subtasks included? >> >> Regards, >> -Rami >> Disclaimer: This message and any attachments thereto are intended solely >> for the addressed recipient(s) and may contain confidential information. If >> you are not the intended recipient, please notify the sender by reply >> e-mail and delete the e-mail (including any attachments thereto) without >> producing, distributing or retaining any copies thereof. Any review, >> dissemination or other use of, or taking of any action in reliance upon, >> this information by persons or entities other than the intended >> recipient(s) is prohibited. Thank you. >> > > > Disclaimer: This message and any attachments thereto are intended solely > for the addressed recipient(s) and may contain confidential information. If > you are not the intended recipient, please notify the sender by reply > e-mail and delete the e-mail (including any attachments thereto) without > producing, distributing or retaining any copies thereof. Any review, > dissemination or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended > recipient(s) is prohibited. Thank you. >