xintongsong commented on code in PR #21843: URL: https://github.com/apache/flink/pull/21843#discussion_r1101041753
########## docs/content/docs/deployment/memory/network_mem_tuning.md: ########## @@ -97,20 +97,17 @@ The actual value of parallelism from which the problem occurs is various from jo ## Network buffer lifecycle Flink has several local buffer pools - one for the output stream and one for each input gate. -Each of those pools is limited to at most +The upper limit of the size of each buffer pool is called the buffer pool **Target**, which is calculated by the following formula. `#channels * taskmanager.network.memory.buffers-per-channel + taskmanager.network.memory.floating-buffers-per-gate` The size of the buffer can be configured by setting `taskmanager.memory.segment-size`. ### Input network buffers -Buffers in the input channel are divided into exclusive and floating buffers. Exclusive buffers can be used by only one particular channel. A channel can request additional floating buffers from a buffer pool shared across all channels belonging to the given input gate. The remaining floating buffers are optional and are acquired only if there are enough resources available. +Not all buffers in the buffer pool Target can be obtained eventually. A **Threshold** is introduced to divide the buffer pool Target into two parts. The part below the threshold is called required. The excess part buffers, if any, is optional. A task will fail if the required buffers cannot be obtained in runtime. A task will not fail due to not obtaining optional buffers, but may suffer a performance reduction. If not explicitly configured, the default value of the threshold is Integer.MAX_VALUE for streaming workloads, and 1000 for batch workloads. -In the initialization phase: -- Flink will try to acquire the configured amount of exclusive buffers for each channel -- all exclusive buffers must be fulfilled or the job will fail with an exception -- a single floating buffer has to be allocated for Flink to be able to make progress +It is not recommended to adjust the above threshold during normal use. Unless you are a Flink network expert and can clearly understand the impact of this threshold, you can adjust the above threshold through the option `taskmanager.network.memory.read-buffer.required-per-gate.max`. If this option is configured to a smaller value, it can avoid the "insufficient number of network buffers" exception as much as possible, but may suffer a performance reduction silently. If this option is configured as Integer.MAX_VALUE, the required buffer limit is disabled. When the feature is disabled, more read buffers may be required in runtime, which is good for performance but this may lead to more easily throwing insufficient network buffers exceptions. Review Comment: ```suggestion The default value for this threshold is `Integer.MAX_VALUE` for streaming workloads, and `1000` for batch workloads. We do not recommend users to change this threshold, unless the user has good reasons and knows what he/she is doing well. The relevant configuration option is `taskmanager.network.memory.read-buffer.required-per-gate.max`. In general, a smaller threshold leads to less chance of the "insufficient number of network buffers" exception, while the workloads may suffer performance reduction silently, and vice versa. ``` ########## docs/content/docs/deployment/memory/network_mem_tuning.md: ########## @@ -97,20 +97,17 @@ The actual value of parallelism from which the problem occurs is various from jo ## Network buffer lifecycle Flink has several local buffer pools - one for the output stream and one for each input gate. -Each of those pools is limited to at most +The upper limit of the size of each buffer pool is called the buffer pool **Target**, which is calculated by the following formula. Review Comment: ```suggestion The target size of each buffer pool is calculated by the following formula. ``` ########## docs/content/docs/deployment/memory/network_mem_tuning.md: ########## @@ -97,20 +97,17 @@ The actual value of parallelism from which the problem occurs is various from jo ## Network buffer lifecycle Flink has several local buffer pools - one for the output stream and one for each input gate. -Each of those pools is limited to at most +The upper limit of the size of each buffer pool is called the buffer pool **Target**, which is calculated by the following formula. `#channels * taskmanager.network.memory.buffers-per-channel + taskmanager.network.memory.floating-buffers-per-gate` The size of the buffer can be configured by setting `taskmanager.memory.segment-size`. ### Input network buffers -Buffers in the input channel are divided into exclusive and floating buffers. Exclusive buffers can be used by only one particular channel. A channel can request additional floating buffers from a buffer pool shared across all channels belonging to the given input gate. The remaining floating buffers are optional and are acquired only if there are enough resources available. +Not all buffers in the buffer pool Target can be obtained eventually. A **Threshold** is introduced to divide the buffer pool Target into two parts. The part below the threshold is called required. The excess part buffers, if any, is optional. A task will fail if the required buffers cannot be obtained in runtime. A task will not fail due to not obtaining optional buffers, but may suffer a performance reduction. If not explicitly configured, the default value of the threshold is Integer.MAX_VALUE for streaming workloads, and 1000 for batch workloads. Review Comment: ```suggestion The target buffer pool size is not always reached. There's a threshold controlling whether Flink should fail upon not obtaining buffers. The part of the target number of buffers that below this threshold is considered required. The remaining, if any, is optional. Not obtaining required buffers will lead to task failures. A task will not fail if it cannot obtain optional buffers, but may suffer a performance reduction. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org