Yes, increase the taskmanager.network.memory.fraction in your case. Also reduce the parallelism will reduce number of network buffer required for your job. I never used 1.4.x, so don’t know about it.
Ivan > On Jul 31, 2020, at 11:37 PM, Rahul Patwari <rahulpatwari8...@gmail.com> > wrote: > > Thanks for your reply, Ivan. > > I think taskmanager.network.memory.max is by default 1GB. > In my case, the network buffers memory is 13112 * 32768 = around 400MB which > is 10% of the TM memory as by default taskmanager.network.memory.fraction is > 0.1. > Do you mean to increase taskmanager.network.memory.fraction? > If Flink is upgraded from 1.4.2 to 1.8.2 does the application need more > network buffers? > Can this issue happen sporadically? sometimes this issue is not seen when the > job manager is restarted. > I am thinking whether having fewer network buffers is the root cause (or) if > the root cause is something else which triggers this issue. > > On Sat, Aug 1, 2020 at 9:36 AM Ivan Yang <ivanygy...@gmail.com > <mailto:ivanygy...@gmail.com>> wrote: > Hi Rahul, > > Try to increase taskmanager.network.memory.max to 1GB, basically double what > you have now. However, you only have 4GB RAM for the entire TM, seems out of > proportion to have 1GB network buffer with 4GB total RAM. Reducing number of > shuffling will require less network buffer. But if your job need the > shuffling, then you may consider to add more memory to TM. > > Thanks, > Ivan > >> On Jul 31, 2020, at 2:02 PM, Rahul Patwari <rahulpatwari8...@gmail.com >> <mailto:rahulpatwari8...@gmail.com>> wrote: >> >> Hi, >> >> We are observing "Insufficient number of Network Buffers" issue Sporadically >> when Flink is upgraded from 1.4.2 to 1.8.2. >> The state of the tasks with this issue translated from DEPLOYING to FAILED. >> Whenever this issue occurs, the job manager restarts. Sometimes, the issue >> goes away after the restart. >> As we are not getting the issue consistently, we are in a dilemma of whether >> to change the memory configurations or not. >> >> Min recommended No. of Network Buffers: (8 * 8) * 8 * 4 = 2048 >> The exception says that 13112 no. of network buffers are present, which is >> 6x the recommendation. >> >> Is reducing the no. of shuffles the only way to reduce the no. of network >> buffers required? >> >> Thanks, >> Rahul >> >> configs: >> env: Kubernetes >> Flink: 1.8.2 >> using default configs for memory.fraction, memory.min, memory.max. >> using 8 TM, 8 slots/TM >> Each TM is running with 1 core, 4 GB Memory. >> >> Exception: >> java.io.IOException: Insufficient number of network buffers: required 2, but >> only 0 available. The total number of network buffers is currently set to >> 13112 of 32768 bytes each. You can increase this number by setting the >> configuration keys 'taskmanager.network.memory.fraction', >> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'. >> at >> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:138) >> at >> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.assignExclusiveSegments(SingleInputGate.java:311) >> at >> org.apache.flink.runtime.io.network.NetworkEnvironment.setupInputGate(NetworkEnvironment.java:271) >> at >> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:224) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:614) >> at java.lang.Thread.run(Thread.java:748) >