[ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693129#comment-16693129 ]
wgcn commented on FLINK-10884: ------------------------------ I found a unit test that per default the off heap memory is set to what the network buffers require. [https://github.com/apache/flink/blob/f629b05a2cdc2c07f4a19456cf5b3e5fdd6ff607/flink-runtime/src/test/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParametersTest.java#L37] I guess that the author of the code do that willfully .I don't know why the author do that > Flink on yarn TM container will be killed by nodemanager because of the > exceeded physical memory. > ---------------------------------------------------------------------------------------------------- > > Key: FLINK-10884 > URL: https://issues.apache.org/jira/browse/FLINK-10884 > Project: Flink > Issue Type: Bug > Components: Cluster Management, Core > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Environment: version : 1.6.2 > module : flink on yarn > centos jdk1.8 > hadoop 2.7 > Reporter: wgcn > Assignee: wgcn > Priority: Major > Labels: yarn > > TM container will be killed by nodemanager because of the exceeded > [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi] > memory. I found the lanuch context lanuching TM container that > "container memory = heap memory+ offHeapSizeMB" at the class > org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters > from line 160 to 166 I set a safety margin for the whole memory container > using. For example if the container limit 3g memory, the sum memory that > "heap memory+ offHeapSizeMB" is equal to 2.4g to prevent the container > being killed.Do we have the > [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] > solution or I can commit my solution -- This message was sent by Atlassian JIRA (v7.6.3#76005)