The new BlockingSubpartition implementation in 1.9 uses mmap for data reading by default which means it steals memory from OS. The mmaped region memory is managed by JVM, so there should be no OutOfMemory problem reported by JVM and the OS memory is also not exhausted, so there should be no kernal OOM. I think Piotr's suspicion is right, yarn tracked the memory used and killed the TM (the mmap region is also part of the process memory).
Giving a strict resource restriction to the container (larger than the yarn limit) which can avoid memory steal or using file instead of mmap as pointed out by Piotr can solve the problem. I think Flink may need to restrict the amount of memory can be stolen. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/