Hi Yuxin, Thanks for creating this FLIP.
It's good if Flink does not require users to set a very large network memory, or tune the advanced(hard-to-understand) per-channel/per-gate buffer configs, to avoid "Insufficient number of network buffers" exceptions which can easily happen for large scale jobs. Regarding the new config "taskmanager.memory.network.read-required-buffer.max", I think it's still an advanced config which users may feel hard to tune. However, given that in most cases users will not need to set it, I think it's acceptable. So +1 for this FLIP. In the future, I think Flink should adaptively select to use exclusive buffers or not according to whether there are sufficient network buffers at runtime. Users then no longer need to understand the above configuration. This may require supporting transitions between exclusive buffers and floating buffers. A problem of all buffer floating is that too few network buffers can result in task slowness which is hard to identify by users. So it's also needed to do improvements on metrics and web UI to expose such issues. Thanks, Zhu Yanfei Lei <fredia...@gmail.com> 于2022年12月26日周一 11:13写道: > > Hi Yuxin, > > Thanks for the proposal! > > After reading the FLIP, I have some questions about the default value. > This FLIP seems to introduce a *new* config > option(taskmanager.memory.network.required-buffer-per-gate.max) to control > the network memory usage. > 1. Is this configuration at the job level or cluster level? As the FLIP > described, the default values of the Batch job and Stream job are > different, If an explicit value is set for cluster level, will it affect > all Batch jobs and Stream jobs on the cluster? > > 2. The default value of Batch Job depends on the value of > ExclusiveBuffersPerChannel(taskmanager.network.memory.buffers-per-channel), > if the value of ExclusiveBuffersPerChannel changed, does > "taskmanager.memory.network.required-buffer-per-gate.max" need to change > with it? > > > Best, > Yanfei > > Dong Lin <lindon...@gmail.com> 于2022年12月25日周日 08:58写道: > > > Hi Yuxin, > > > > Thanks for proposing the FLIP! > > > > The motivation section makes sense. But it seems that the proposed change > > section mixes the proposed config with the evaluation results. It is a bit > > hard to understand what configs are proposed and how to describe these > > configs to users. Given that the configuration setting is part of public > > interfaces, it might be helpful to add a dedicated public interface section > > to describe the config key and config semantics, as suggested in the FLIP > > template here > > < > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > > . > > > > This FLIP seems to add more configs without removing any config from Flink. > > Intuitively this can make the Flink configuration harder rather than > > simpler. Maybe we can get a better idea after we add a public interface > > section to clarify those configs. > > > > Thanks, > > Dong > > > > > > On Mon, Dec 19, 2022 at 3:36 PM Yuxin Tan <tanyuxinw...@gmail.com> wrote: > > > > > Hi, devs, > > > > > > I'd like to start a discussion about FLIP-266: Simplify network memory > > > configurations for TaskManager[1]. > > > > > > When using Flink, users may encounter the following issues that affect > > > usability. > > > 1. The job may fail with an "Insufficient number of network buffers" > > > exception. > > > 2. Flink network memory size adjustment is complex. > > > When encountering these issues, users can solve some problems by adding > > or > > > adjusting parameters. However, multiple memory config options should be > > > changed. The config option adjustment requires understanding the detailed > > > internal implementation, which is impractical for most users. > > > > > > To simplify network memory configurations for TaskManager and improve > > Flink > > > usability, this FLIP proposed some optimization solutions for the issues. > > > > > > Looking forward to your feedback. > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-266%3A+Simplify+network+memory+configurations+for+TaskManager > > > > > > Best regards, > > > Yuxin > > > > >