Hello Maximilian, I'm using 1.0.0 compiled with Scala 2.11 and Hadoop 2.7. I'm running this on EMR. I didn't see any exceptions in other logs. What are the logs you are interested in?
Thanks, Timur On Mon, Apr 25, 2016 at 3:44 AM, Maximilian Michels <m...@apache.org> wrote: > Hi Timur, > > Which version of Flink are you using? Could you share the entire logs? > > Thanks, > Max > > On Mon, Apr 25, 2016 at 12:05 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > Hi Timur, > > > > The reason why we only allocate 570mb for the heap is because you are > > allocating most of the memory as off heap (direct byte buffers). > > > > In theory, the memory footprint of the JVM is limited to 570 (heap) + > 1900 > > (direct mem) = 2470 MB (which is below 2500). But in practice thje JVM is > > allocating more memory, causing these killings by YARN. > > > > I have to check the code of Flink again, because I would expect the > safety > > boundary to be much larger than 30 mb. > > > > Regards, > > Robert > > > > > > On Fri, Apr 22, 2016 at 9:47 PM, Timur Fayruzov < > timur.fairu...@gmail.com> > > wrote: > >> > >> Hello, > >> > >> Next issue in a string of things I'm solving is that my application > fails > >> with the message 'Connection unexpectedly closed by remote task > manager'. > >> > >> Yarn log shows the following: > >> > >> Container [pid=4102,containerID=container_1461341357870_0004_01_000015] > is > >> running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB > >> physical memory used; 9.0 GB of 12.3 GB virtual memory used. Killing > >> container. > >> Dump of the process-tree for container_1461341357870_0004_01_000015 : > >> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > >> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > >> |- 4102 4100 4102 4102 (bash) 1 7 115806208 715 /bin/bash -c > >> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m > >> -XX:MaxDirectMemorySize=1900m > >> > -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log > >> -Dlogback.configurationFile=file:logback.xml > >> -Dlog4j.configuration=file:log4j.properties > >> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1> > >> > /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.out > >> 2> > >> > /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.err > >> |- 4306 4102 4102 4102 (java) 172258 40265 9495257088 646460 > >> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m > >> -XX:MaxDirectMemorySize=1900m > >> > -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log > >> -Dlogback.configurationFile=file:logback.xml > >> -Dlog4j.configuration=file:log4j.properties > >> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . > >> > >> One thing that drew my attention is `-Xmx570m`. I expected it to be > >> TaskManagerMemory*0.75 (due to yarn.heap-cutoff-ratio). I run the > >> application as follows: > >> HADOOP_CONF_DIR=/etc/hadoop/conf flink run -m yarn-cluster -yn 18 -yjm > >> 4096 -ytm 2500 eval-assembly-1.0.jar > >> > >> In flink logs I do see 'Task Manager memory: 2500'. When I look at the > >> yarn container logs on the cluster node I see that it starts with 570mb, > >> which puzzles me. When I look at the actually allocated memory for a > Yarn > >> container using 'top' I see 2.2GB used. Am I interpreting these > parameters > >> correctly? > >> > >> I also have set (it failed in the same way without this as well): > >> taskmanager.memory.off-heap: true > >> > >> Also, I don't understand why this happens at all. I assumed that Flink > >> won't overcommit allocated resources and will spill to the disk when > running > >> out of heap memory. Appreciate if someone can shed light on this too. > >> > >> Thanks, > >> Timur > > > > >