Hi, community, When running a Flink streaming job with big state size, one task manager process was killed by the yarn node manager. The following log is from the yarn node manager:
2021-04-16 11:51:23,013 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=521232,containerID=container_e157_1618223445363_16943_01_000010] is running 19562496B beyond the 'PHYSICAL' memory limit. Current usage: 12.0 GB of 12 GB physical memory used; 15.2 GB of 25.2 GB virtual memory used. Killing container. When searching solution for this problem, I found that there is a option for this that worked for bounded shuffle. So is there a way to get rid of this in streaming mode? PS: memory related options: taskmanager.memory.process.size:12288m taskmanager.memory.managed.fraction:0.7