Hi Florian,

Thanks for following up. Does it consistently work for you if -ytm and -yjm
are set to 2 GB?

Can you enable DEBUG level logging, submit with 1GB of memory again, and
send
all TaskManager logs in addition? The output of yarn logs -applicationId
<appid> should suffice.

The Flink version that is logged should be read from META-INF/MANIFEST.MF,
in
the flink-dist jar. However, the value looks correct in the Hadoop-free
Flink
1.5.2 binary distribution. Can you tell us what Hadoop distribution (+
version) you are using? It would help to reproduce the issues that you have
found.

Best,
Gary



On Tue, Aug 7, 2018 at 8:19 PM, Florian Simond <florian.sim...@hotmail.fr>
wrote:

> Hi Gary,
>
>
> Good intuition... yarn.scheduler.minimum-allocation-mb is set to 2048 :)
>
>
> I specified -ytm 2048 and -yjm 2048 and the job started right away, I will
> also try again later to see if it's not luck. Thanks a lot !
>
>
> Regarding the version, it is still 0.1, and that I have no clue.... I
> downloaded 1.5.2 from this link : https://www.apache.org/dyn/
> closer.lua/flink/flink-1.5.2/flink-1.5.2-bin-scala_2.11.tgz
>
>
> It should be the official build... It seems to be correct afterall, I see "
> 1a9b648" there too: https://github.com/apache/flink/releases
>
>
> But I don't know why it's written version 0.1...
> ------------------------------
> *De :* Gary Yao <g...@data-artisans.com>
> *Envoyé :* mardi 7 août 2018 19:30
> *À :* Florian Simond
> *Cc :* user@flink.apache.org
>
> *Objet :* Re: Could not build the program from JAR file.
>
> Hi Florian,
>
> Thank you for the logs. They look indeed strange but I cannot reproduce
> this
> behavior. From the logs I can see that the ResourceManager is requesting
> containers with different resource profiles (2048mb or 1024mb memory):
>
>     Requesting new TaskExecutor container with resources <memory:2048,
> vCores:1>. Number pending requests 1.
>     Requesting new TaskExecutor container with resources <memory:1024,
> vCores:1>. Number pending requests 1.
>
> At the moment I am not exactly sure how this happens, and if this is
> problematic at all. It would be helpful to know if you are configuring YARN
> with a yarn.scheduler.minimum-allocation-mb of 2048.
>
> Here are some other things to try out for troubleshooting:
>
> Can you try raising the the TM and JM memory to 2048mb (-ytm 2048 -yjm
> 2048)?
> You are setting -ytm to 1024, which results in a heap size of only 424mb.
>
> Is the deployment also slow if you are running example/streaming/WordCount.
> jar?
>
> The version in the log shows up as:
>
>     "Version: 0.1, Rev:1a9b648, Date:25.07.2018 @ 17:10:13 GMT".
>
> This does not seem to be an official Flink 1.5.2 distribution. Is there are
> reason for that, and can you rule out that there are no changes? Maybe try
> out
> the official build?
>
> Best,
> Gary
>
>
> On Tue, Aug 7, 2018 at 2:44 PM, Florian Simond <florian.sim...@hotmail.fr>
> wrote:
>
> Hi Gary,
>
>
> No, I am not starting multiple "per-job clusters".
>
>
> I didn't configure anything regarding the number of slots per TM, so I
> guess the default value (1 then).
>
>
> But on the YARN UI I see that the number of "running containers" varies a
> lot (13 then 1 then 8 then 2 then 27 then 6 etc...)
>
>
> Here is the full jobmanager log:
>
> https://paste.ee/p/m7hCH
>
>
> This time it took longer to start (10 minutes)
>
> And completed on this line:
>
> 2018-08-07 14:31:11,852 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: 
> Custom Source -> Sink: Unnamed (1/1) (655509c673d8ae19aac195276ad2c3e6) 
> switched from DEPLOYING to RUNNING.
>
>
>
> Thanks a lot for your help and your time,
>
> Florian
>
>
>
>
> ------------------------------
> *De :* Gary Yao <g...@data-artisans.com>
> *Envoyé :* mardi 7 août 2018 14:15
>
> *À :* Florian Simond
> *Cc :* vino yang; user@flink.apache.org
> *Objet :* Re: Could not build the program from JAR file.
>
> Hi Florian,
>
> 5 minutes sounds too slow. Are you starting multiple "per-job clusters" at
> the
> same time? How many slots do you configure per TM? After you submit the
> job,
> how many resources do you have left in your YARN cluster?
>
> It might be that you are affected by FLINK-9455 [1]: Flink requests
> unnecessary resources from YARN and blocks the execution of other jobs
> temporarily. The workaround is to configure only one slot per TM.
>
> If the above does not help, can you attach the full ClusterEntrypoint
> (JobManager) logs?
>
> Best,
> Gary
>
> [1] https://issues.apache.org/jira/browse/FLINK-9455
>
>
> On Tue, Aug 7, 2018 at 12:34 PM, Florian Simond <florian.sim...@hotmail.fr
> > wrote:
>
> Thank you!
>
>
> So it is now normal that it takes around 5 minutes to start processing ?
> The job is reading from kafka and writing back into another kafka topic.
> When I start the job, it takes roughly 5 minutes before I get something in
> the output topic.
>
>
> I see a lot of
>
> 2018-08-07 12:20:34,672 INFO  org.apache.flink.yarn.YarnResourceManager       
>               - Received new container: XXX - Remaining pending container 
> requests: 0
> 2018-08-07 12:20:34,672 INFO  org.apache.flink.yarn.YarnResourceManager       
>               - Returning excess container XXX.
>
>
> I see a lot of those lines during the first five minutes.
>
>
> I'm not sure I need to have a static set of TMs, but as we have a limited
> set of nodes, and several jobs, it could be harder to make sure they do not
> interfere with each other...
>
>
> ------------------------------
> *De :* Gary Yao <g...@data-artisans.com>
> *Envoyé :* mardi 7 août 2018 12:27
>
> *À :* Florian Simond
> *Cc :* vino yang; user@flink.apache.org
> *Objet :* Re: Could not build the program from JAR file.
>
> You can find more information about the re-worked deployment model here:
>
>     https://cwiki.apache.org/confluence/pages/viewpage.action?pa
> geId=65147077
>
> TaskManagers are started and shut down according to the slot requirements
> of
> the jobs. It is possible to return to the old behavior by setting
>
>     mode: old
>
> in flink-conf.yaml. However, this mode is deprecated and will be removed
> soon.
> Can you explain why you need to have a static set of TMs?
>
> On Tue, Aug 7, 2018 at 12:07 PM, Florian Simond <florian.sim...@hotmail.fr
> > wrote:
>
> Indeed, that's the solution.
>
>
> It was done automatically before with 1.4.2, that's why I missed that
> part...
>
>
> Do you have any pointer about the dynamic number of TaskManagers ? I'm
> curious to know how it works. Is it possible to still fix it ?
>
>
> Thank you,
>
> Florian
>
>
> ------------------------------
> *De :* Gary Yao <g...@data-artisans.com>
> *Envoyé :* mardi 7 août 2018 11:55
> *À :* Florian Simond
> *Cc :* vino yang; user@flink.apache.org
> *Objet :* Re: Could not build the program from JAR file.
>
> Hi Florian,
>
> Can you run
>
>     export HADOOP_CLASSPATH=`hadoop classpath`
>
> before submitting the job [1]?
>
> Moreover, you should not use the -yn parameter. Beginning with Flink 1.5,
> the
> number of TaskManagers is not fixed anymore.
>
> Best,
> Gary
>
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/
> ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths
> Apache Flink 1.5 Documentation: Hadoop Integration
> <https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths>
> ci.apache.org
> Deployment & Operations; Clusters & Deployment; Hadoop Integration; Hadoop
> Integration. Configuring Flink with Hadoop Classpaths; Configuring Flink
> with Hadoop Classpaths
> Apache Flink 1.5 Documentation: Hadoop Integration
> <https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths>
> ci.apache.org
> Deployment & Operations; Clusters & Deployment; Hadoop Integration; Hadoop
> Integration. Configuring Flink with Hadoop Classpaths; Configuring Flink
> with Hadoop Classpaths
>
>
>
> On Tue, Aug 7, 2018 at 9:22 AM, Florian Simond <florian.sim...@hotmail.fr>
> wrote:
>
> In the log, I can see that:
>
>
> First exception is a warning, not sure if it is important.
>
>
> Second one seems to be the one. It tries to find the file "-yn" ???
>
>
> 2018-08-07 09:16:04,776 WARN  org.apache.flink.client.cli.CliFrontend
>                    - Could not load CLI class org.apache.flink.yarn.cli.Flin
> kYarnSessionCli.
> java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:264)
>         at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLin
> e(CliFrontend.java:1208)
>         at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLin
> es(CliFrontend.java:1164)
>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.jav
> a:1090)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.conf.Configuration
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 5 more
> 2018-08-07 09:16:04,789 INFO  org.apache.flink.core.fs.FileSystem
>                    - Hadoop is not in the classpath/dependencies. The
> extended set of supported File Systems via Hadoop is not available.
> 2018-08-07 09:16:04,967 INFO  org.apache.flink.runtime.secur
> ity.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module
> because Hadoop cannot be found in the Classpath.
> 2018-08-07 09:16:04,991 INFO  org.apache.flink.runtime.security.SecurityUtils
>              - Cannot install HadoopSecurityContext because Hadoop cannot
> be found in the Classpath.
> 2018-08-07 09:16:05,041 INFO  org.apache.flink.client.cli.CliFrontend
>                    - Running 'run' command.
> 2018-08-07 09:16:05,046 INFO  org.apache.flink.client.cli.CliFrontend
>                    - Building program from JAR file
> 2018-08-07 09:16:05,046 ERROR org.apache.flink.client.cli.CliFrontend
>                    - Invalid command line arguments.
> org.apache.flink.client.cli.CliArgsException: Could not build the program
> from JAR file.
>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java
> :208)
>         at org.apache.flink.client.cli.CliFrontend.parseParameters(CliF
> rontend.java:1025)
>         at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFro
> ntend.java:1101)
>         at org.apache.flink.runtime.security.NoOpSecurityContext.runSec
> ured(NoOpSecurityContext.java:30)
>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.jav
> a:1101)
> Caused by: java.io.FileNotFoundException: JAR file does not exist: -yn
>         at org.apache.flink.client.cli.CliFrontend.buildProgram(CliFron
> tend.java:828)
>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java
> :205)
>         ... 4 more
>
>
>
> ------------------------------
> *De :* vino yang <yanghua1...@gmail.com>
> *Envoyé :* mardi 7 août 2018 09:01
> *À :* Gary Yao
> *Cc :* Florian Simond; user@flink.apache.org
> *Objet :* Re: Could not build the program from JAR file.
>
> Hi Florian,
>
> The error message is because of a FileNotFoundException, see here[1]. Is
> there any more information about the exception. Do you make sure the jar
> exist?
>
> [1]: https://github.com/apache/flink/blob/master/flink-clien
> ts/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209
>
> <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209>
> apache/flink
> <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209>
> github.com
> flink - Apache Flink
>
> <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209>
> apache/flink
> <https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L209>
> github.com
> <http://github.com/>
> Build software better, together <http://github.com/>
> github.com
> GitHub is where people build software. More than 28 million people use
> GitHub to discover, fork, and contribute to over 85 million projects.
> flink - Apache Flink
>
>
> Thanks, vino.
>
> 2018-08-07 14:28 GMT+08:00 Gary Yao <g...@data-artisans.com>:
>
> Hi Florian,
>
> You write that Flink 1.4.2 works but what version is not working for you?
>
> Best,
> Gary
>
>
>
> On Tue, Aug 7, 2018 at 8:25 AM, Florian Simond <florian.sim...@hotmail.fr>
> wrote:
>
> Hi all,
>
>
> I'm trying to run the wordCount example on my YARN cluster and this is not
> working.. I get the error message specified in title: Could not build the
> program from JAR file.
>
>
>
> > $ ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
> ./examples/batch/WordCount.jar
> > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was
> set.
> > Could not build the program from JAR file.
>
> > Use the help option (-h or --help) to get help on the command.
>
>
> I also have the same problem with a custom JAR...
>
>
>
> With Flink 1.4.2, I have no problem at all. Both the WordCount example and
> my custom JAR are working...
>
>
>
> What do I do wrong ?
>
>
>
>
>
>
>
>
>
>

Reply via email to