Hi Florian, Thanks for following up. Does it consistently work for you if -ytm and -yjm are set to 2 GB?
Can you enable DEBUG level logging, submit with 1GB of memory again, and send all TaskManager logs in addition? The output of yarn logs -applicationId <appid> should suffice. The Flink version that is logged should be read from META-INF/MANIFEST.MF, in the flink-dist jar. However, the value looks correct in the Hadoop-free Flink 1.5.2 binary distribution. Can you tell us what Hadoop distribution (+ version) you are using? It would help to reproduce the issues that you have found. Best, Gary On Tue, Aug 7, 2018 at 8:19 PM, Florian Simond <florian.sim...@hotmail.fr> wrote: > Hi Gary, > > > Good intuition... yarn.scheduler.minimum-allocation-mb is set to 2048 :) > > > I specified -ytm 2048 and -yjm 2048 and the job started right away, I will > also try again later to see if it's not luck. Thanks a lot ! > > > Regarding the version, it is still 0.1, and that I have no clue.... I > downloaded 1.5.2 from this link : https://www.apache.org/dyn/ > closer.lua/flink/flink-1.5.2/flink-1.5.2-bin-scala_2.11.tgz > > > It should be the official build... It seems to be correct afterall, I see " > 1a9b648" there too: https://github.com/apache/flink/releases > > > But I don't know why it's written version 0.1... > ------------------------------ > *De :* Gary Yao <g...@data-artisans.com> > *Envoyé :* mardi 7 août 2018 19:30 > *À :* Florian Simond > *Cc :* user@flink.apache.org > > *Objet :* Re: Could not build the program from JAR file. > > Hi Florian, > > Thank you for the logs. They look indeed strange but I cannot reproduce > this > behavior. From the logs I can see that the ResourceManager is requesting > containers with different resource profiles (2048mb or 1024mb memory): > > Requesting new TaskExecutor container with resources <memory:2048, > vCores:1>. Number pending requests 1. > Requesting new TaskExecutor container with resources <memory:1024, > vCores:1>. Number pending requests 1. > > At the moment I am not exactly sure how this happens, and if this is > problematic at all. It would be helpful to know if you are configuring YARN > with a yarn.scheduler.minimum-allocation-mb of 2048. > > Here are some other things to try out for troubleshooting: > > Can you try raising the the TM and JM memory to 2048mb (-ytm 2048 -yjm > 2048)? > You are setting -ytm to 1024, which results in a heap size of only 424mb. > > Is the deployment also slow if you are running example/streaming/WordCount. > jar? > > The version in the log shows up as: > > "Version: 0.1, Rev:1a9b648, Date:25.07.2018 @ 17:10:13 GMT". > > This does not seem to be an official Flink 1.5.2 distribution. Is there are > reason for that, and can you rule out that there are no changes? Maybe try > out > the official build? > > Best, > Gary > > > On Tue, Aug 7, 2018 at 2:44 PM, Florian Simond <florian.sim...@hotmail.fr> > wrote: > > Hi Gary, > > > No, I am not starting multiple "per-job clusters". > > > I didn't configure anything regarding the number of slots per TM, so I > guess the default value (1 then). > > > But on the YARN UI I see that the number of "running containers" varies a > lot (13 then 1 then 8 then 2 then 27 then 6 etc...) > > > Here is the full jobmanager log: > > https://paste.ee/p/m7hCH > > > This time it took longer to start (10 minutes) > > And completed on this line: > > 2018-08-07 14:31:11,852 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > Custom Source -> Sink: Unnamed (1/1) (655509c673d8ae19aac195276ad2c3e6) > switched from DEPLOYING to RUNNING. > > > > Thanks a lot for your help and your time, > > Florian > > > > > ------------------------------ > *De :* Gary Yao <g...@data-artisans.com> > *Envoyé :* mardi 7 août 2018 14:15 > > *À :* Florian Simond > *Cc :* vino yang; user@flink.apache.org > *Objet :* Re: Could not build the program from JAR file. > > Hi Florian, > > 5 minutes sounds too slow. Are you starting multiple "per-job clusters" at > the > same time? How many slots do you configure per TM? After you submit the > job, > how many resources do you have left in your YARN cluster? > > It might be that you are affected by FLINK-9455 [1]: Flink requests > unnecessary resources from YARN and blocks the execution of other jobs > temporarily. The workaround is to configure only one slot per TM. > > If the above does not help, can you attach the full ClusterEntrypoint > (JobManager) logs? > > Best, > Gary > > [1] https://issues.apache.org/jira/browse/FLINK-9455 > > > On Tue, Aug 7, 2018 at 12:34 PM, Florian Simond <florian.sim...@hotmail.fr > > wrote: > > Thank you! > > > So it is now normal that it takes around 5 minutes to start processing ? > The job is reading from kafka and writing back into another kafka topic. > When I start the job, it takes roughly 5 minutes before I get something in > the output topic. > > > I see a lot of > > 2018-08-07 12:20:34,672 INFO org.apache.flink.yarn.YarnResourceManager > - Received new container: XXX - Remaining pending container > requests: 0 > 2018-08-07 12:20:34,672 INFO org.apache.flink.yarn.YarnResourceManager > - Returning excess container XXX. > > > I see a lot of those lines during the first five minutes. > > > I'm not sure I need to have a static set of TMs, but as we have a limited > set of nodes, and several jobs, it could be harder to make sure they do not > interfere with each other... > > > ------------------------------ > *De :* Gary Yao <g...@data-artisans.com> > *Envoyé :* mardi 7 août 2018 12:27 > > *À :* Florian Simond > *Cc :* vino yang; user@flink.apache.org > *Objet :* Re: Could not build the program from JAR file. > > You can find more information about the re-worked deployment model here: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pa > geId=65147077 > > TaskManagers are started and shut down according to the slot requirements > of > the jobs. It is possible to return to the old behavior by setting > > mode: old > > in flink-conf.yaml. However, this mode is deprecated and will be removed > soon. > Can you explain why you need to have a static set of TMs? > > On Tue, Aug 7, 2018 at 12:07 PM, Florian Simond <florian.sim...@hotmail.fr > > wrote: > > Indeed, that's the solution. > > > It was done automatically before with 1.4.2, that's why I missed that > part... > > > Do you have any pointer about the dynamic number of TaskManagers ? I'm > curious to know how it works. Is it possible to still fix it ? > > > Thank you, > > Florian > > > ------------------------------ > *De :* Gary Yao <g...@data-artisans.com> > *Envoyé :* mardi 7 août 2018 11:55 > *À :* Florian Simond > *Cc :* vino yang; user@flink.apache.org > *Objet :* Re: Could not build the program from JAR file. > > Hi Florian, > > Can you run > > export HADOOP_CLASSPATH=`hadoop classpath` > > before submitting the job [1]? > > Moreover, you should not use the -yn parameter. Beginning with Flink 1.5, > the > number of TaskManagers is not fixed anymore. > > Best, > Gary > > > [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/ > ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths > Apache Flink 1.5 Documentation: Hadoop Integration > <https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths> > ci.apache.org > Deployment & Operations; Clusters & Deployment; Hadoop Integration; Hadoop > Integration. Configuring Flink with Hadoop Classpaths; Configuring Flink > with Hadoop Classpaths > Apache Flink 1.5 Documentation: Hadoop Integration > <https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths> > ci.apache.org > Deployment & Operations; Clusters & Deployment; Hadoop Integration; Hadoop > Integration. Configuring Flink with Hadoop Classpaths; Configuring Flink > with Hadoop Classpaths > > > > On Tue, Aug 7, 2018 at 9:22 AM, Florian Simond <florian.sim...@hotmail.fr> > wrote: > > In the log, I can see that: > > > First exception is a warning, not sure if it is important. > > > Second one seems to be the one. It tries to find the file "-yn" ??? > > > 2018-08-07 09:16:04,776 WARN org.apache.flink.client.cli.CliFrontend > - Could not load CLI class org.apache.flink.yarn.cli.Flin > kYarnSessionCli. > java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:264) > at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLin > e(CliFrontend.java:1208) > at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLin > es(CliFrontend.java:1164) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.jav > a:1090) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.conf.Configuration > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 5 more > 2018-08-07 09:16:04,789 INFO org.apache.flink.core.fs.FileSystem > - Hadoop is not in the classpath/dependencies. The > extended set of supported File Systems via Hadoop is not available. > 2018-08-07 09:16:04,967 INFO org.apache.flink.runtime.secur > ity.modules.HadoopModuleFactory - Cannot create Hadoop Security Module > because Hadoop cannot be found in the Classpath. > 2018-08-07 09:16:04,991 INFO org.apache.flink.runtime.security.SecurityUtils > - Cannot install HadoopSecurityContext because Hadoop cannot > be found in the Classpath. > 2018-08-07 09:16:05,041 INFO org.apache.flink.client.cli.CliFrontend > - Running 'run' command. > 2018-08-07 09:16:05,046 INFO org.apache.flink.client.cli.CliFrontend > - Building program from JAR file > 2018-08-07 09:16:05,046 ERROR org.apache.flink.client.cli.CliFrontend > - Invalid command line arguments. > org.apache.flink.client.cli.CliArgsException: Could not build the program > from JAR file. > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java > :208) > at org.apache.flink.client.cli.CliFrontend.parseParameters(CliF > rontend.java:1025) > at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFro > ntend.java:1101) > at org.apache.flink.runtime.security.NoOpSecurityContext.runSec > ured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.jav > a:1101) > Caused by: java.io.FileNotFoundException: JAR file does not exist: -yn > at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFron > tend.java:828) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java > :205) > ... 4 more > > > > ------------------------------ > *De :* vino yang <yanghua1...@gmail.com> > *Envoyé :* mardi 7 août 2018 09:01 > *À :* Gary Yao > *Cc :* Florian Simond; user@flink.apache.org > *Objet :* Re: Could not build the program from JAR file. > > Hi Florian, > > The error message is because of a FileNotFoundException, see here[1]. Is > there any more information about the exception. Do you make sure the jar > exist? > > [1]: https://github.com/apache/flink/blob/master/flink-clien > ts/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209 > > <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209> > apache/flink > <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209> > github.com > flink - Apache Flink > > <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209> > apache/flink > <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209> > github.com > <http://github.com/> > Build software better, together <http://github.com/> > github.com > GitHub is where people build software. More than 28 million people use > GitHub to discover, fork, and contribute to over 85 million projects. > flink - Apache Flink > > > Thanks, vino. > > 2018-08-07 14:28 GMT+08:00 Gary Yao <g...@data-artisans.com>: > > Hi Florian, > > You write that Flink 1.4.2 works but what version is not working for you? > > Best, > Gary > > > > On Tue, Aug 7, 2018 at 8:25 AM, Florian Simond <florian.sim...@hotmail.fr> > wrote: > > Hi all, > > > I'm trying to run the wordCount example on my YARN cluster and this is not > working.. I get the error message specified in title: Could not build the > program from JAR file. > > > > > $ ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 > ./examples/batch/WordCount.jar > > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was > set. > > Could not build the program from JAR file. > > > Use the help option (-h or --help) to get help on the command. > > > I also have the same problem with a custom JAR... > > > > With Flink 1.4.2, I have no problem at all. Both the WordCount example and > my custom JAR are working... > > > > What do I do wrong ? > > > > > > > > > >