Hi all! We are currently investigating the memory usage of the Flink JVMs. While we have the consumed memory in the JVM heap well under control, the memory that the JVM itself allocates outside the heap seems to be quite a bit.
The non-heap memory goes up to a gigabyte for a 3GB heap space in some cases. The problem surfaces by YARN killing Flink JVMs because the processes grow too large unless we deduce really large amounts of memory from the heap size (at least 25 %). Robert is running a big series of experiments, but we have not yet developed a good understanding what eats up the memory (Flink itself is not using any off-heap memory in this setup). It seems that a combination of Stack Space, PermGen Space, Code Cache, JIT and GC space eat up large amounts of memory. Anyone has experiences with debugging Java non-heap memory usage? It seems the tools available to debug this are quite limited... Greetings, Stephan