Hi Rahul,

Try to increase taskmanager.network.memory.max to 1GB, basically double what 
you have now. However, you only have 4GB RAM for the entire TM, seems out of 
proportion to have 1GB network buffer with 4GB total RAM. Reducing number of 
shuffling will require less network buffer. But if your job need the shuffling, 
then you may consider to add more memory to TM.

Thanks,
Ivan

> On Jul 31, 2020, at 2:02 PM, Rahul Patwari <rahulpatwari8...@gmail.com> wrote:
> 
> Hi,
> 
> We are observing "Insufficient number of Network Buffers" issue Sporadically 
> when Flink is upgraded from 1.4.2 to 1.8.2.
> The state of the tasks with this issue translated from DEPLOYING to FAILED. 
> Whenever this issue occurs, the job manager restarts. Sometimes, the issue 
> goes away after the restart.
> As we are not getting the issue consistently, we are in a dilemma of whether 
> to change the memory configurations or not.
> 
> Min recommended No. of Network Buffers: (8 * 8) * 8 * 4 = 2048
> The exception says that 13112 no. of network buffers are present, which is 6x 
> the recommendation.
> 
> Is reducing the no. of shuffles the only way to reduce the no. of network 
> buffers required?
> 
> Thanks,
> Rahul 
> 
> configs:
> env: Kubernetes 
> Flink: 1.8.2
> using default configs for memory.fraction, memory.min, memory.max.
> using 8 TM, 8 slots/TM
> Each TM is running with 1 core, 4 GB Memory.
> 
> Exception:
> java.io.IOException: Insufficient number of network buffers: required 2, but 
> only 0 available. The total number of network buffers is currently set to 
> 13112 of 32768 bytes each. You can increase this number by setting the 
> configuration keys 'taskmanager.network.memory.fraction', 
> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
> at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:138)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.assignExclusiveSegments(SingleInputGate.java:311)
> at 
> org.apache.flink.runtime.io.network.NetworkEnvironment.setupInputGate(NetworkEnvironment.java:271)
> at 
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:224)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:614)
> at java.lang.Thread.run(Thread.java:748)

Reply via email to