I should say I'm running the current Flink master branch. On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <ro.schmid...@gmail.com> wrote:
> It's me again. This is a strange issue, I hope I managed to find the right > keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of > memory each. > > When running my job like so: > > $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 ..... > > The job completes without any problems. When running it like so: > > $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 ..... > > (note the one more M of memory for the JM), the execution stalls, > continuously reporting: > > ..... > TaskManager status (6/7) > TaskManager status (6/7) > TaskManager status (6/7) > ..... > > I did some poking around, but I couldn't find any direct correlation with > the code. > > The JM log says: > > ..... > 16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ > - JVM Options: > 16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ > - -Xmx12289M > ..... > > but then continues to report > > ..... > 16:52:59,311 INFO > org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user > requested 7 containers, 6 running. 1 containers missing > 16:52:59,831 INFO > org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user > requested 7 containers, 6 running. 1 containers missing > 16:53:00,351 INFO > org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user > requested 7 containers, 6 running. 1 containers missing > ..... > > forever until I cancel the job. > > If you have any ideas I'm happy to try them out. Thanks in advance for any > hints! Cheers. > > Robert > -- > My GPG Key ID: 336E2680 > -- My GPG Key ID: 336E2680