[ https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905101#comment-15905101 ]
Stephan Ewen commented on FLINK-4545: ------------------------------------- [~greghogan] Sorry, just saw I never answered to your comment. The line of thoughts for the fix is as follows: The management of buffers involves (1) making sure that you have enough and (2) making sure you don't have too much where too much hurts. (1) Is not too hard, we can either allocate buffers in demand as needed or simply allocate more by default (clever heuristic depending on JVM size). The later may still require an increase in network buffers in few cases. (2) Is different in batch and streaming workloads: - Batch always benefits from more network memory, as this causes better smoothing over short term back pressure. - The exception in batch is when you eat so much memory that the disk caches suffer, as you mentioned. We are not taking this into account currently, as we assume there is a total "Flink budget" of memory (frequently simply the JVM heap) and we only work in distributing memory within that budget. - Streaming must not have too much data in flight, as in flight data needs to be aligned during checkpoints. - More memory may still help to keep data around after sending, for replays (relevant later) The fix will introduce a new type of data exchange {{PIPELINED_BOUNDED}} which will be used by streaming and make sure not too much data can be in flight in a single input gate or output gate. > Flink automatically manages TM network buffer > --------------------------------------------- > > Key: FLINK-4545 > URL: https://issues.apache.org/jira/browse/FLINK-4545 > Project: Flink > Issue Type: Wish > Components: Network > Reporter: Zhenzhong Xu > > Currently, the number of network buffer per task manager is preconfigured and > the memory is pre-allocated through taskmanager.network.numberOfBuffers > config. In a Job DAG with shuffle phase, this number can go up very high > depends on the TM cluster size. The formula for calculating the buffer count > is documented here > (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers). > > #slots-per-TM^2 * #TMs * 4 > In a standalone deployment, we may need to control the task manager cluster > size dynamically and then leverage the up-coming Flink feature to support > scaling job parallelism/rescaling at runtime. > If the buffer count config is static at runtime and cannot be changed without > restarting task manager process, this may add latency and complexity for > scaling process. I am wondering if there is already any discussion around > whether the network buffer should be automatically managed by Flink or at > least expose some API to allow it to be reconfigured. Let me know if there is > any existing JIRA that I should follow. -- This message was sent by Atlassian JIRA (v6.3.15#6346)