Yangze Guo created FLINK-20863:
----------------------------------

             Summary: Exclude network memory from ResourceProfile
                 Key: FLINK-20863
                 URL: https://issues.apache.org/jira/browse/FLINK-20863
             Project: Flink
          Issue Type: Task
            Reporter: Yangze Guo
             Fix For: 1.13.0


Network memory is included in the current ResourceProfile implementation, 
expecting the fine-grained resource management to not deploy too many tasks 
onto a TM that require more network memory than the TM contains.

However, how much network memory each task needs highly depends on the shuffle 
service implementation, and may vary when switching to another shuffle service. 
Therefore, neither user nor the Flink runtime can easily specify network memory 
requirements for a task/slot at the moment.

The concrete solution for network memory controlling is beyond the scope of 
this FLIP. However, we are aware of a few potential directions for solving this 
problem.
- Make shuffle services adaptively control the amount of memory assigned to 
each task/slot, with respect to the given memory pool size. In this way, there 
should be no need to rely on fine-grained resource management to control the 
network memory consumption.
- Make shuffle services expose interfaces for calculating network memory 
requirements for given SSGs. In this way, the Flink runtime can specify the 
calculated network memory requirements for slots, without having to understand 
the internal details of different shuffle service implementations.

As for now, we propose to exclude network memory from ResourceProfile for the 
moment, to unblock the fine-grained resource management feature from the 
network memory controlling issue. If needed, it can be added back in future, as 
long as there’s a good way to specify the requirement.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to