I've had similar problems when running Flink in Yarn. Flink task manager fails and it can't launch re-start jobs because there aren't enough slots and eventually Yarn decides to terminate Flink and you lose all your jobs & state because Flink regards it as a graceful shutdown. My latest attempt to solve the issue was to attempt to disable the vmem and pmem checks in yarn with the "yarn.nodemanager.pmem-check-enabled" and "yarn.nodemanager.vmem-check-enabled" settings. It's been ok so far, but I'm not totally sure if it was a good idea or not.
Of course, I'm not sure if that's the exact same problem you're having because I'm not sure if you're running Flink in Yarn or not. -Shannon On 4/14/17, 2:55 AM, "sohimankotia" <sohimanko...@gmail.com> wrote: >I am running a flink streaming job with parallelism 1 . > >Suddenly after 4 hours job failed . It showed > >Container container_e39_1492083788459_0676_01_000002 is completed with >diagnostics: Container >[pid=79546,containerID=container_e39_1492083788459_0676_01_000002] is >running beyond physical memory limits. Current usage: 2.0 GB of 2 GB >physical memory used; 2.9 GB of 4.2 GB virtual memory used. Killing >container. > > >I tried to monitor with jmap on task manager and did not get anything that >can cause out of memory . No out of memory error in logs also > > > >-- >View this message in context: >http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Container-is-is-running-beyond-physical-memory-limits-Current-usage-2-0-GB-of-2-GB-physical-memory-u-tp12615.html >Sent from the Apache Flink User Mailing List archive. mailing list archive at >Nabble.com. >