Re: All but one TMs connect when JM has more than 16G of memory

Robert Schmidtke Wed, 30 Sep 2015 08:09:59 -0700

I should say I'm running the current Flink master branch.

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <ro.schmid...@gmail.com>
wrote:


> It's me again. This is a strange issue, I hope I managed to find the right
> keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of
> memory each.
>
> When running my job like so:
>
> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....
>
> The job completes without any problems. When running it like so:
>
> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....
>
> (note the one more M of memory for the JM), the execution stalls,
> continuously reporting:
>
> .....
> TaskManager status (6/7)
> TaskManager status (6/7)
> TaskManager status (6/7)
> .....
>
> I did some poking around, but I couldn't find any direct correlation with
> the code.
>
> The JM log says:
>
> .....
> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>        -  JVM Options:
> 16:49:01,893 INFO  org.apache.flink.yarn.ApplicationMaster$
>        -     -Xmx12289M
> .....
>
> but then continues to report
>
> .....
> 16:52:59,311 INFO
>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
> requested 7 containers, 6 running. 1 containers missing
> 16:52:59,831 INFO
>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
> requested 7 containers, 6 running. 1 containers missing
> 16:53:00,351 INFO
>  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1    - The user
> requested 7 containers, 6 running. 1 containers missing
> .....
>
> forever until I cancel the job.
>
> If you have any ideas I'm happy to try them out. Thanks in advance for any
> hints! Cheers.
>
> Robert
> --
> My GPG Key ID: 336E2680
>



-- 
My GPG Key ID: 336E2680

Re: All but one TMs connect when JM has more than 16G of memory

Reply via email to