Would it be helpful to add additional message in the error message in
NetworkBufferPool#createBufferPool to check the
taskmanager.network.numberOfBuffers property?


- Henry

On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <johngou...@gmail.com> wrote:
> Perfect! It worked! Thanks a lot for the help!
>
> On 18 February 2015 at 22:13, Fabian Hueske <fhue...@gmail.com> wrote:
>>
>> 2048 is the default. So you didn't actually increase the number of buffers
>> ;-)
>>
>> Try 4096 or so.
>>
>> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <johngou...@gmail.com>:
>>>
>>> Hi!
>>>
>>> thank you for your replies!
>>> I increased the number of network buffers:
>>>
>>> taskmanager.network.numberOfBuffers: 2048
>>>
>>> but I am still getting the same error:
>>>
>>> Insufficient number of network buffers: required 120, but only 2 of 2048
>>> available.
>>>
>>> Thanks a lot!
>>>
>>>
>>> On 18 February 2015 at 20:27, Fabian Hueske <fhue...@gmail.com> wrote:
>>>>
>>>> Hi Yiannis,
>>>>
>>>> if you scale Flink to larger setups you need to adapt the number of
>>>> network buffers.
>>>> The background section of the configuration reference explains the
>>>> details on that [1].
>>>>
>>>> Let us know, if that helped to solve the problem.
>>>>
>>>> Best, Fabian
>>>>
>>>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>>>
>>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <johngou...@gmail.com>:
>>>>>
>>>>> Hi there,
>>>>>
>>>>> I have a cluster of 10 nodes with 12 CPUs each.
>>>>> This is my configuration:
>>>>>
>>>>> jobmanager.rpc.port: 6123
>>>>>
>>>>> jobmanager.heap.mb: 4024
>>>>>
>>>>> taskmanager.heap.mb: 8096
>>>>>
>>>>> taskmanager.numberOfTaskSlots: 12
>>>>>
>>>>> parallelization.degree.default: 120
>>>>>
>>>>> I have been getting the following error:
>>>>>
>>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>>>> - execution #0 to slot SimpleSlot (1)(0) - 
>>>>> efc370a0b2a9a63f2e7b960cfe4e4c27
>>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>>>> buffers: required 120, but only 2 of 2048 available.
>>>>> at
>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>>>> at
>>>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> at
>>>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>> at
>>>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>>> I failed to get any info online on how to solve it.
>>>>> Any help would be welcome.
>>>>>
>>>>> Thank you!
>>>>
>>>>
>>>
>>
>

Reply via email to