I agree with Henry. We should include the name of the required configuration parameter into the exception. Users often run into this issue.
I've filed a JIRA to track the fix: https://issues.apache.org/jira/browse/FLINK-1646 On Thu, Feb 19, 2015 at 6:18 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > Would it be helpful to add additional message in the error message in > NetworkBufferPool#createBufferPool to check the > taskmanager.network.numberOfBuffers property? > > > - Henry > > On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <johngou...@gmail.com> > wrote: > > Perfect! It worked! Thanks a lot for the help! > > > > On 18 February 2015 at 22:13, Fabian Hueske <fhue...@gmail.com> wrote: > >> > >> 2048 is the default. So you didn't actually increase the number of > buffers > >> ;-) > >> > >> Try 4096 or so. > >> > >> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <johngou...@gmail.com>: > >>> > >>> Hi! > >>> > >>> thank you for your replies! > >>> I increased the number of network buffers: > >>> > >>> taskmanager.network.numberOfBuffers: 2048 > >>> > >>> but I am still getting the same error: > >>> > >>> Insufficient number of network buffers: required 120, but only 2 of > 2048 > >>> available. > >>> > >>> Thanks a lot! > >>> > >>> > >>> On 18 February 2015 at 20:27, Fabian Hueske <fhue...@gmail.com> wrote: > >>>> > >>>> Hi Yiannis, > >>>> > >>>> if you scale Flink to larger setups you need to adapt the number of > >>>> network buffers. > >>>> The background section of the configuration reference explains the > >>>> details on that [1]. > >>>> > >>>> Let us know, if that helped to solve the problem. > >>>> > >>>> Best, Fabian > >>>> > >>>> [1] http://flink.apache.org/docs/0.8/config.html#background > >>>> > >>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <johngou...@gmail.com>: > >>>>> > >>>>> Hi there, > >>>>> > >>>>> I have a cluster of 10 nodes with 12 CPUs each. > >>>>> This is my configuration: > >>>>> > >>>>> jobmanager.rpc.port: 6123 > >>>>> > >>>>> jobmanager.heap.mb: 4024 > >>>>> > >>>>> taskmanager.heap.mb: 8096 > >>>>> > >>>>> taskmanager.numberOfTaskSlots: 12 > >>>>> > >>>>> parallelization.degree.default: 120 > >>>>> > >>>>> I have been getting the following error: > >>>>> > >>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) > (65/120) > >>>>> - execution #0 to slot SimpleSlot (1)(0) - > efc370a0b2a9a63f2e7b960cfe4e4c27 > >>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of > network > >>>>> buffers: required 120, but only 2 of 2048 available. > >>>>> at > >>>>> > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155) > >>>>> at > >>>>> > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163) > >>>>> at > >>>>> org.apache.flink.runtime.taskmanager.TaskManager.org > $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426) > >>>>> at > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261) > >>>>> at > >>>>> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > >>>>> at > >>>>> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > >>>>> at > >>>>> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > >>>>> at > >>>>> > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > >>>>> at > >>>>> > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > >>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > >>>>> at > >>>>> > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > >>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > >>>>> at > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89) > >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487) > >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) > >>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > >>>>> at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > >>>>> > >>>>> at > >>>>> > org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344) > >>>>> at akka.dispatch.OnComplete.internal(Future.scala:247) > >>>>> at akka.dispatch.OnComplete.internal(Future.scala:244) > >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174) > >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171) > >>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > >>>>> at > >>>>> > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > >>>>> at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > >>>>> at > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > >>>>> > >>>>> > >>>>> I failed to get any info online on how to solve it. > >>>>> Any help would be welcome. > >>>>> > >>>>> Thank you! > >>>> > >>>> > >>> > >> > > >