Hi, thanks a lot for fixing this so quickly!
Just to make sure I fully understand what happened, how come the travis build was successful when we merged this, but failed later? Is there a way to avoid such issues in the future? Cheers, Vasia. On 15 March 2015 at 17:07, Stephan Ewen <se...@apache.org> wrote: > Waiting for travis to give me the green light, then I'll push the fix... > > On Sun, Mar 15, 2015 at 5:04 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > I think the issue is that our tests are executed on travis machines with > > different physical CPU core counts. > > > > I've pushed a 5 days old commit ( > > > > > https://github.com/rmetzger/flink/commit/b4e8350f52c81704ffc726a1689bb0dc7180776d > > ) > > to travis, and it also failed with that issue: > > https://travis-ci.org/rmetzger/flink/builds/54443951 > > > > Thanks for resolving the issue so quickly Stephan! > > > > On Sun, Mar 15, 2015 at 4:06 PM, Andra Lungu <lungu.an...@gmail.com> > > wrote: > > > > > Hi Stephan, > > > > > > The degree of parallelism was manually set there. > > MultipleProgramsTestBase > > > cannot be extended; Ufuk explained why. > > > > > > But I see that for the latest travis check, that test passed. > > > https://github.com/apache/flink/pull/475 > > > > > > On Sun, Mar 15, 2015 at 3:54 PM, Stephan Ewen <se...@apache.org> > wrote: > > > > > > > Cause of the Failures: > > > > > > > > The tests in DegreesWithExceptionITCase use the context execution > > > > environment without extending a test base. This context environment > > > > instantiates a local excution environment with a parallelism equal to > > the > > > > number of cores. Since on travis, build run in containers on big > > > machines, > > > > the number of cores may be very high 32/64 - this causes the tests to > > run > > > > out of network buffers, with the default configuration. > > > > > > > > > > > > IMPORTANT: Please make sure that all tests in the future either use > one > > > of > > > > the test base classes (that define a reasonable parallelism), or > define > > > the > > > > parallelism manually to be safe! > > > > > > > > On Sun, Mar 15, 2015 at 3:43 PM, Stephan Ewen <se...@apache.org> > > wrote: > > > > > > > > > It seems that the current master is broken, with respect to the > > tests. > > > > > > > > > > I see all build on Travis consistently failing, in the gelly > project. > > > > > Since Travis is a bit behind in the "apache" account, I triggered a > > > build > > > > > in my own account. The hash is the same, it should contain the > master > > > > from > > > > > yesterday. > > > > > > > > > > https://travis-ci.org/StephanEwen/incubator-flink/builds/54386416 > > > > > > > > > > In all executions it results in the stack trace below. I cannot > > > reproduce > > > > > the problem locally, unfortunately. > > > > > > > > > > This is a serious issue, it totally kills the testability. > > > > > > > > > > Results : > > > > > > > > > > Failed tests: > > > > > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcId:113 > > > > expected:<[The edge src/trg id could not be found within the > > vertexIds]> > > > > but was:<[Failed to deploy the task Reduce(SUM(1), at > > > > getDegrees(Graph.java:664) (30/32) - execution #0 to slot SimpleSlot > > > (2)(2) > > > > - 31624115d75feb2c387ae9043021d8e6 - ALLOCATED/ALIVE: > > > java.io.IOException: > > > > Insufficient number of network buffers: required 32, but only 2 > > > available. > > > > The total number of network buffers is currently set to 2048. You can > > > > increase this number by setting the configuration key > > > > 'taskmanager.network.numberOfBuffers'. > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163) > > > > > at org.apache.flink.runtime.taskmanager.TaskManager.org > > > > > > > > > > $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > > > > > at > > > > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > > > > at > > > > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > > > > > at > > > > > > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > ]> > > > > > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeTrgId:92 > > > > expected:<[The edge src/trg id could not be found within the > > vertexIds]> > > > > but was:<[Failed to deploy the task CoGroup (CoGroup at > > > > inDegrees(Graph.java:655)) (29/32) - execution #0 to slot SimpleSlot > > > (1)(3) > > > > - 1735ca6f2fb76f9f0a0ab03ffd9c9f93 - ALLOCATED/ALIVE: > > > java.io.IOException: > > > > Insufficient number of network buffers: required 32, but only 8 > > > available. > > > > The total number of network buffers is currently set to 2048. You can > > > > increase this number by setting the configuration key > > > > 'taskmanager.network.numberOfBuffers'. > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:135) > > > > > at org.apache.flink.runtime.taskmanager.TaskManager.org > > > > > > > > > > $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > > > > > at > > > > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > > > > at > > > > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > > > > > at > > > > > > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > ]> > > > > > DegreesWithExceptionITCase.testGetDegreesInvalidEdgeSrcTrgId:134 > > > > expected:<[The edge src/trg id could not be found within the > > vertexIds]> > > > > but was:<[Failed to deploy the task CoGroup (CoGroup at > > > > inDegrees(Graph.java:655)) (31/32) - execution #0 to slot SimpleSlot > > > (1)(3) > > > > - 3a465bdbeca9625e5d90572ed0959b1d - ALLOCATED/ALIVE: > > > java.io.IOException: > > > > Insufficient number of network buffers: required 32, but only 8 > > > available. > > > > The total number of network buffers is currently set to 2048. You can > > > > increase this number by setting the configuration key > > > > 'taskmanager.network.numberOfBuffers'. > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:158) > > > > > at > > > > > > > > > > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:135) > > > > > at org.apache.flink.runtime.taskmanager.TaskManager.org > > > > > > > > > > $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:454) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:237) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > > > > > at > > > > > > > > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) > > > > > at > > > > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > > > > > at > > > > > > > > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > > > > > at > > > > > > > > > > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:91) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > > > > at > > > > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > > > > > at > > > > > > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > ]> > > > > > > > > > > Tests run: 180, Failures: 3, Errors: 0, Skipped: 0 > > > > > > > > > > [INFO] > > > > > [INFO] --- maven-failsafe-plugin:2.17:verify (default) @ > flink-gelly > > > --- > > > > > [INFO] Failsafe report directory: > > > > > > > > > > /home/travis/build/StephanEwen/incubator-flink/flink-staging/flink-gelly/target/failsafe-reports > > > > > [INFO] > > > > > > ------------------------------------------------------------------------ > > > > > [INFO] Reactor Summary: > > > > > [INFO] > > > > > [INFO] flink .............................................. > SUCCESS [ > > > > 6.075 s] > > > > > [INFO] flink-shaded-hadoop ................................ > SUCCESS [ > > > > 1.827 s] > > > > > [INFO] flink-shaded-hadoop1 ............................... > SUCCESS [ > > > > 7.384 s] > > > > > [INFO] flink-core ......................................... > SUCCESS [ > > > > 37.973 s] > > > > > [INFO] flink-java ......................................... > SUCCESS [ > > > > 17.373 s] > > > > > [INFO] flink-runtime ...................................... SUCCESS > > > > [11:13 min] > > > > > [INFO] flink-compiler ..................................... > SUCCESS [ > > > > 7.149 s] > > > > > [INFO] flink-clients ...................................... > SUCCESS [ > > > > 9.130 s] > > > > > [INFO] flink-test-utils ................................... > SUCCESS [ > > > > 8.519 s] > > > > > [INFO] flink-scala ........................................ > SUCCESS [ > > > > 36.171 s] > > > > > [INFO] flink-examples ..................................... > SUCCESS [ > > > > 0.370 s] > > > > > [INFO] flink-java-examples ................................ > SUCCESS [ > > > > 2.335 s] > > > > > [INFO] flink-scala-examples ............................... > SUCCESS [ > > > > 25.139 s] > > > > > [INFO] flink-staging ...................................... > SUCCESS [ > > > > 0.093 s] > > > > > [INFO] flink-streaming .................................... > SUCCESS [ > > > > 0.315 s] > > > > > [INFO] flink-streaming-core ............................... > SUCCESS [ > > > > 9.560 s] > > > > > [INFO] flink-tests ........................................ SUCCESS > > > > [09:11 min] > > > > > [INFO] flink-avro ......................................... > SUCCESS [ > > > > 17.307 s] > > > > > [INFO] flink-jdbc ......................................... > SUCCESS [ > > > > 3.715 s] > > > > > [INFO] flink-spargel ...................................... > SUCCESS [ > > > > 7.141 s] > > > > > [INFO] flink-hadoop-compatibility ......................... > SUCCESS [ > > > > 19.508 s] > > > > > [INFO] flink-streaming-scala .............................. > SUCCESS [ > > > > 14.936 s] > > > > > [INFO] flink-streaming-connectors ......................... > SUCCESS [ > > > > 2.784 s] > > > > > [INFO] flink-streaming-examples ........................... > SUCCESS [ > > > > 18.787 s] > > > > > [INFO] flink-hbase ........................................ > SUCCESS [ > > > > 2.870 s] > > > > > [INFO] flink-gelly ........................................ > FAILURE [ > > > > 58.548 s] > > > > > [INFO] flink-hcatalog ..................................... SKIPPED > > > > > [INFO] flink-expressions .................................. SKIPPED > > > > > [INFO] flink-quickstart ................................... SKIPPED > > > > > [INFO] flink-quickstart-java .............................. SKIPPED > > > > > [INFO] flink-quickstart-scala ............................. SKIPPED > > > > > [INFO] flink-contrib ...................................... SKIPPED > > > > > [INFO] flink-dist ......................................... SKIPPED > > > > > [INFO] > > > > > > ------------------------------------------------------------------------ > > > > > [INFO] BUILD FAILURE > > > > > [INFO] > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > > > > > > > > >