[ 
https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896993#comment-15896993
 ] 

ASF GitHub Bot commented on FLINK-4545:
---------------------------------------

Github user zhijiangW commented on the issue:

    https://github.com/apache/flink/pull/3467
  
    Hi @NicoK , I am interested in this issue and I like the way of asserting 
hold lock in this PR.
    
    It is really necessary to manage network buffers by framework, because it 
is difficult to set the exact number of buffers by users. And our current 
simple solution is to expand the `ResourceProfile` by adding the total number 
of input and output edges for `Execution`. Then the `ResourceManager` would 
calculate the buffer amounts based on that and overwrite the parameter value to 
`TaskManager` configuration.
    
    From @StephanEwen mentioned before, I know a little for this issue. Would 
you share some detail designs for plans for it if have, then I can learn and 
track the progress in time.  Thank you !


> Flink automatically manages TM network buffer
> ---------------------------------------------
>
>                 Key: FLINK-4545
>                 URL: https://issues.apache.org/jira/browse/FLINK-4545
>             Project: Flink
>          Issue Type: Wish
>          Components: Network
>            Reporter: Zhenzhong Xu
>
> Currently, the number of network buffer per task manager is preconfigured and 
> the memory is pre-allocated through taskmanager.network.numberOfBuffers 
> config. In a Job DAG with shuffle phase, this number can go up very high 
> depends on the TM cluster size. The formula for calculating the buffer count 
> is documented here 
> (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
>   
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster 
> size dynamically and then leverage the up-coming Flink feature to support 
> scaling job parallelism/rescaling at runtime. 
> If the buffer count config is static at runtime and cannot be changed without 
> restarting task manager process, this may add latency and complexity for 
> scaling process. I am wondering if there is already any discussion around 
> whether the network buffer should be automatically managed by Flink or at 
> least expose some API to allow it to be reconfigured. Let me know if there is 
> any existing JIRA that I should follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to