[ https://issues.apache.org/jira/browse/FLINK-33668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jiang Xin updated FLINK-33668: ------------------------------ Description: With FLINK-30469 and FLINK-31643, we have decoupled the shuffle network memory and the parallelism of tasks by limiting the number of buffers for each InputGate and ResultPartition. However, when too many shuffle tasks are running simultaneously on the same TaskManager, "Insufficient number of network buffers" errors would still occur. This usually happens when Slot Sharing Group is enabled or a TaskManager contains multiple slots. We want to make sure that the TaskManager does not encounter "Insufficient number of network buffers" even if there are dozens of InputGates and ResultPartitions running on the same TaskManager simultaneously. was: With [FLINK-30469|https://issues.apache.org/jira/browse/FLINK-30469] and [FLINK-31643|https://issues.apache.org/jira/browse/FLINK-31643], we have decoupled the shuffle network memory and the parallelism of tasks by limiting the number of buffers for each InputGate and ResultPartition. However, when too many shuffle tasks are running simultaneously on the same TaskManager, "Insufficient number of network buffers" errors would still occur. This usually happens when Slot Sharing Group is enabled or a TaskManager contains multiple slots. So we need to make sure that the TaskManager does not encounter "Insufficient number of network buffers" even if there are dozens of InputGates and ResultPartitions running on the same TaskManager simultaneously. > Decoupling Shuffle network memory and job topology > -------------------------------------------------- > > Key: FLINK-33668 > URL: https://issues.apache.org/jira/browse/FLINK-33668 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Reporter: Jiang Xin > Priority: Major > Fix For: 1.19.0 > > > With FLINK-30469 and FLINK-31643, we have decoupled the shuffle network > memory and the parallelism of tasks by limiting the number of buffers for > each InputGate and ResultPartition. However, when too many shuffle tasks are > running simultaneously on the same TaskManager, "Insufficient number of > network buffers" errors would still occur. This usually happens when Slot > Sharing Group is enabled or a TaskManager contains multiple slots. > We want to make sure that the TaskManager does not encounter "Insufficient > number of network buffers" even if there are dozens of InputGates and > ResultPartitions running on the same TaskManager simultaneously. -- This message was sent by Atlassian Jira (v8.20.10#820010)