Thanks for your reply, Ivan. I think taskmanager.network.memory.max is by default 1GB. In my case, the network buffers memory is 13112 * 32768 = around 400MB which is 10% of the TM memory as by default taskmanager.network.memory.fraction is 0.1. Do you mean to increase taskmanager.network.memory.fraction?
1. If Flink is upgraded from 1.4.2 to 1.8.2 does the application need more network buffers? 2. Can this issue happen sporadically? sometimes this issue is not seen when the job manager is restarted. I am thinking whether having fewer network buffers is the root cause (or) if the root cause is something else which triggers this issue. On Sat, Aug 1, 2020 at 9:36 AM Ivan Yang <ivanygy...@gmail.com> wrote: > Hi Rahul, > > Try to increase taskmanager.network.memory.max to 1GB, basically double > what you have now. However, you only have 4GB RAM for the entire TM, seems > out of proportion to have 1GB network buffer with 4GB total RAM. Reducing > number of shuffling will require less network buffer. But if your job need > the shuffling, then you may consider to add more memory to TM. > > Thanks, > Ivan > > On Jul 31, 2020, at 2:02 PM, Rahul Patwari <rahulpatwari8...@gmail.com> > wrote: > > Hi, > > We are observing "Insufficient number of Network Buffers" issue > Sporadically when Flink is upgraded from 1.4.2 to 1.8.2. > The state of the tasks with this issue translated from DEPLOYING to > FAILED. > Whenever this issue occurs, the job manager restarts. Sometimes, the issue > goes away after the restart. > As we are not getting the issue consistently, we are in a dilemma of > whether to change the memory configurations or not. > > Min recommended No. of Network Buffers: (8 * 8) * 8 * 4 = 2048 > The exception says that 13112 no. of network buffers are present, which is > 6x the recommendation. > > Is reducing the no. of shuffles the only way to reduce the no. of network > buffers required? > > Thanks, > Rahul > > configs: > env: Kubernetes > Flink: 1.8.2 > using default configs for memory.fraction, memory.min, memory.max. > using 8 TM, 8 slots/TM > Each TM is running with 1 core, 4 GB Memory. > > Exception: > java.io.IOException: Insufficient number of network buffers: required 2, > but only 0 available. The total number of network buffers is currently set > to 13112 of 32768 bytes each. You can increase this number by setting the > configuration keys 'taskmanager.network.memory.fraction', > 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:138) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.assignExclusiveSegments(SingleInputGate.java:311) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.setupInputGate(NetworkEnvironment.java:271) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:224) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:614) > at java.lang.Thread.run(Thread.java:748) > > >