Re: sporadic "Insufficient no of network buffers" issue

Ivan Yang Fri, 31 Jul 2020 23:54:46 -0700

Yes, increase the taskmanager.network.memory.fraction in your case. Also reduce 
the parallelism will reduce number of network buffer required for your job. I 
never used 1.4.x, so don’t know about it.


Ivan

> On Jul 31, 2020, at 11:37 PM, Rahul Patwari <rahulpatwari8...@gmail.com> 
> wrote:
> 
> Thanks for your reply, Ivan.
> 
> I think taskmanager.network.memory.max is by default 1GB. 
> In my case, the network buffers memory is 13112 * 32768 = around 400MB which 
> is 10% of the TM memory as by default taskmanager.network.memory.fraction is 
> 0.1.
> Do you mean to increase taskmanager.network.memory.fraction?
> If Flink is upgraded from 1.4.2 to 1.8.2 does the application need more 
> network buffers?
> Can this issue happen sporadically? sometimes this issue is not seen when the 
> job manager is restarted.
> I am thinking whether having fewer network buffers is the root cause (or) if 
> the root cause is something else which triggers this issue.
> 
> On Sat, Aug 1, 2020 at 9:36 AM Ivan Yang <ivanygy...@gmail.com 
> <mailto:ivanygy...@gmail.com>> wrote:
> Hi Rahul,
> 
> Try to increase taskmanager.network.memory.max to 1GB, basically double what 
> you have now. However, you only have 4GB RAM for the entire TM, seems out of 
> proportion to have 1GB network buffer with 4GB total RAM. Reducing number of 
> shuffling will require less network buffer. But if your job need the 
> shuffling, then you may consider to add more memory to TM.
> 
> Thanks,
> Ivan
> 
>> On Jul 31, 2020, at 2:02 PM, Rahul Patwari <rahulpatwari8...@gmail.com 
>> <mailto:rahulpatwari8...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> We are observing "Insufficient number of Network Buffers" issue Sporadically 
>> when Flink is upgraded from 1.4.2 to 1.8.2.
>> The state of the tasks with this issue translated from DEPLOYING to FAILED. 
>> Whenever this issue occurs, the job manager restarts. Sometimes, the issue 
>> goes away after the restart.
>> As we are not getting the issue consistently, we are in a dilemma of whether 
>> to change the memory configurations or not.
>> 
>> Min recommended No. of Network Buffers: (8 * 8) * 8 * 4 = 2048
>> The exception says that 13112 no. of network buffers are present, which is 
>> 6x the recommendation.
>> 
>> Is reducing the no. of shuffles the only way to reduce the no. of network 
>> buffers required?
>> 
>> Thanks,
>> Rahul 
>> 
>> configs:
>> env: Kubernetes 
>> Flink: 1.8.2
>> using default configs for memory.fraction, memory.min, memory.max.
>> using 8 TM, 8 slots/TM
>> Each TM is running with 1 core, 4 GB Memory.
>> 
>> Exception:
>> java.io.IOException: Insufficient number of network buffers: required 2, but 
>> only 0 available. The total number of network buffers is currently set to 
>> 13112 of 32768 bytes each. You can increase this number by setting the 
>> configuration keys 'taskmanager.network.memory.fraction', 
>> 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
>> at 
>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestMemorySegments(NetworkBufferPool.java:138)
>> at 
>> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.assignExclusiveSegments(SingleInputGate.java:311)
>> at 
>> org.apache.flink.runtime.io.network.NetworkEnvironment.setupInputGate(NetworkEnvironment.java:271)
>> at 
>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:224)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:614)
>> at java.lang.Thread.run(Thread.java:748)
>

Re: sporadic "Insufficient no of network buffers" issue

Reply via email to