Cause of the Failures: The tests in DegreesWithExceptionITCase use the context execution environment without extending a test base. This context environment instantiates a local excution environment with a parallelism equal to the number of cores. Since on travis, build run in containers on big machines, the number of cores may be very high 32/64 - this causes the tests to run out of network buffers, with the default configuration.
IMPORTANT: Please make sure that all tests in the future either use one of the test base classes (that define a reasonable parallelism), or define the parallelism manually to be safe! On Sun, Mar 15, 2015 at 3:43 PM, Stephan Ewen <se...@apache.org> wrote: > It seems that the current master is broken, with respect to the tests. > > I see all build on Travis consistently failing, in the gelly project. > Since Travis is a bit behind in the "apache" account, I triggered a build > in my own account. The hash is the same, it should contain the master from > yesterday. > > https://travis-ci.org/StephanEwen/incubator-flink/builds/54386416 > > In all executions it results in the stack trace below. I cannot reproduce > the problem locally, unfortunately. > > This is a serious issue, it totally kills the testability. > > Results : > > Failed tests: > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId:113 > expected:<[The edge src/trg id could not be found within the vertexIds]> but > was:<[Failed to deploy the task Reduce(SUM(1), at getDegrees(Graph.java:664) > (30/32) - execution #0 to slot SimpleSlot (2)(2) - > 31624115d75feb2c387ae9043021d8e6 - ALLOCATED/ALIVE: java.io.IOException: > Insufficient number of network buffers: required 32, but only 2 available. > The total number of network buffers is currently set to 2048. You can > increase this number by setting the configuration key > 'taskmanager.network.numberOfBuffers'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > ]> > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeTrgId:92 expected:<[The > edge src/trg id could not be found within the vertexIds]> but was:<[Failed to > deploy the task CoGroup (CoGroup at inDegrees(Graph.java:655)) (29/32) - > execution #0 to slot SimpleSlot (1)(3) - 1735ca6f2fb76f9f0a0ab03ffd9c9f93 - > ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: > required 32, but only 8 available. The total number of network buffers is > currently set to 2048. You can increase this number by setting the > configuration key 'taskmanager.network.numberOfBuffers'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:135) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > ]> > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcTrgId:134 > expected:<[The edge src/trg id could not be found within the vertexIds]> but > was:<[Failed to deploy the task CoGroup (CoGroup at > inDegrees(Graph.java:655)) (31/32) - execution #0 to slot SimpleSlot (1)(3) - > 3a465bdbeca9625e5d90572ed0959b1d - ALLOCATED/ALIVE: java.io.IOException: > Insufficient number of network buffers: required 32, but only 8 available. > The total number of network buffers is currently set to 2048. You can > increase this number by setting the configuration key > 'taskmanager.network.numberOfBuffers'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:135) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > ]> > > Tests run: 180, Failures: 3, Errors: 0, Skipped: 0 > > [INFO] > [INFO] --- maven-failsafe-plugin:2.17:verify (default) @ flink-gelly --- > [INFO] Failsafe report directory: > /home/travis/build/StephanEwen/incubator-flink/flink-staging/flink-gelly/target/failsafe-reports > [INFO] > ------------------------------------------------------------------------ > [INFO] Reactor Summary: > [INFO] > [INFO] flink .............................................. SUCCESS [ 6.075 > s] > [INFO] flink-shaded-hadoop ................................ SUCCESS [ 1.827 > s] > [INFO] flink-shaded-hadoop1 ............................... SUCCESS [ 7.384 > s] > [INFO] flink-core ......................................... SUCCESS [ 37.973 > s] > [INFO] flink-java ......................................... SUCCESS [ 17.373 > s] > [INFO] flink-runtime ...................................... SUCCESS [11:13 > min] > [INFO] flink-compiler ..................................... SUCCESS [ 7.149 > s] > [INFO] flink-clients ...................................... SUCCESS [ 9.130 > s] > [INFO] flink-test-utils ................................... SUCCESS [ 8.519 > s] > [INFO] flink-scala ........................................ SUCCESS [ 36.171 > s] > [INFO] flink-examples ..................................... SUCCESS [ 0.370 > s] > [INFO] flink-java-examples ................................ SUCCESS [ 2.335 > s] > [INFO] flink-scala-examples ............................... SUCCESS [ 25.139 > s] > [INFO] flink-staging ...................................... SUCCESS [ 0.093 > s] > [INFO] flink-streaming .................................... SUCCESS [ 0.315 > s] > [INFO] flink-streaming-core ............................... SUCCESS [ 9.560 > s] > [INFO] flink-tests ........................................ SUCCESS [09:11 > min] > [INFO] flink-avro ......................................... SUCCESS [ 17.307 > s] > [INFO] flink-jdbc ......................................... SUCCESS [ 3.715 > s] > [INFO] flink-spargel ...................................... SUCCESS [ 7.141 > s] > [INFO] flink-hadoop-compatibility ......................... SUCCESS [ 19.508 > s] > [INFO] flink-streaming-scala .............................. SUCCESS [ 14.936 > s] > [INFO] flink-streaming-connectors ......................... SUCCESS [ 2.784 > s] > [INFO] flink-streaming-examples ........................... SUCCESS [ 18.787 > s] > [INFO] flink-hbase ........................................ SUCCESS [ 2.870 > s] > [INFO] flink-gelly ........................................ FAILURE [ 58.548 > s] > [INFO] flink-hcatalog ..................................... SKIPPED > [INFO] flink-expressions .................................. SKIPPED > [INFO] flink-quickstart ................................... SKIPPED > [INFO] flink-quickstart-java .............................. SKIPPED > [INFO] flink-quickstart-scala ............................. SKIPPED > [INFO] flink-contrib ...................................... SKIPPED > [INFO] flink-dist ......................................... SKIPPED > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > >