[jira] [Comment Edited] (FLINK-10884) Flink on yarn TM container will be killed by nodemanager because of the exceeded physical memory.

zhijiang (JIRA) Tue, 20 Nov 2018 19:48:50 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694177#comment-16694177
 ]


zhijiang edited comment on FLINK-10884 at 11/21/18 3:47 AM:
------------------------------------------------------------

I just quickly reviewed the related codes. In my analysis:

In the process of {{_ContaineredTaskManagerParameters#create_ method,}}

{{"offHeapSizeMB = containerMemoryMB - heapSizeMB"}}

{{The _containerMemoryMB_}} is the container's total physical memory including 
_{{cutofff}}_ and the _{{heapSizeMB}}_ is not covered _{{cutoff}}_ during 
calculation, so the _{{offHeapSizeMB}}_ would cover _{{cutoff}}_ as a result.

In the _{{testOffHeapMemoryWithDefaultConfiguration}}_, the _{{networkBufMB}}_ 
is not covered _{{cutoff}}_ during calculation, so it should be added 
_{{cutoff}}_ factor to compare with the above _{{offHeapSizeMB}}_.


was (Author: zjwang):
I just quickly reviewed the related codes. In my analysis:

In the process of {{ContaineredTaskManagerParameters#create method,}}

{{offHeapSizeMB = containerMemoryMB - heapSizeMB }}

{{The }}{{containerMemoryMB}} is the container's total physical memory 
including {{cutofff}} and t{{he }}{{heapSizeMB}} is not covered {{cutoff}} 
during calculation, so the {{offHeapSizeMB}} would cover {{cutoff}} as a result.

In the test {{testOffHeapMemoryWithDefaultConfiguration}}, the {{networkBufMB}} 
is not covered {{cutoff}} during calculation, so it should be added {{cutoff}} 
factor to compare with the above {{offHeapSizeMB}}.

> Flink on yarn  TM container will be killed by nodemanager because of  the 
> exceeded  physical memory.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10884
>                 URL: https://issues.apache.org/jira/browse/FLINK-10884
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management, Core
>    Affects Versions: 1.5.5, 1.6.2, 1.7.0
>         Environment: version  : 1.6.2 
> module : flink on yarn
> centos  jdk1.8
> hadoop 2.7
>            Reporter: wgcn
>            Assignee: wgcn
>            Priority: Major
>              Labels: yarn
>
> TM container will be killed by nodemanager because of  the exceeded  
> [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi]
>  memory. I found the lanuch context   lanuching TM container  that  
> "container memory =   heap memory+ offHeapSizeMB"  at the class 
> org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters   
> from line 160 to 166  I set a safety margin for the whole memory container 
> using. For example  if the container  limit 3g  memory,  the sum memory that  
>  "heap memory+ offHeapSizeMB"  is equal to  2.4g to prevent the container 
> being killed.Do we have the 
> [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC]
>  solution  or I can commit my solution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-10884) Flink on yarn TM container will be killed by nodemanager because of the exceeded physical memory.

Reply via email to