Hey! The akka communication is not fully stable in the current snapshot master. We are working on this. The Buffer recycled exception is probably an artifact of the cancelling.
Stephan On Wed, Feb 4, 2015 at 2:49 PM, Chesnay Schepler < chesnay.schep...@fu-berlin.de> wrote: > Hello, > > I'm trying to run python jobs with the latest master on a cluster and get > the following exception: > > Error: The program execution failed: JobManager not reachable anymore. > Terminate waiting for job answer. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: JobManager not reachable anymore. Terminate waiting for > job answer. > at org.apache.flink.client.program.Client.run(Client.java:345) > at org.apache.flink.client.program.Client.run(Client.java:304) > at org.apache.flink.client.program.Client.run(Client.java:298) > at org.apache.flink.client.program.ContextEnvironment. > execute(ContextEnvironment.java:55) > at org.apache.flink.api.java.ExecutionEnvironment.execute( > ExecutionEnvironment.java:677) > at org.apache.flink.languagebinding.api.java.python.PythonPlanBinder. > runPlan(PythonPlanBinder.java:106) > at org.apache.flink.languagebinding.api.java. > python.PythonPlanBinder.main(PythonPlanBinder.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.flink.client.program.PackagedProgram.callMainMethod( > PackagedProgram.java:437) > at org.apache.flink.client.program.PackagedProgram. > invokeInteractiveModeForExecution(PackagedProgram.java:353) > at org.apache.flink.client.program.Client.run(Client.java:250) > at org.apache.flink.client.CliFrontend.executeProgram( > CliFrontend.java:387) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:356) > at org.apache.flink.client.CliFrontend.parseParameters( > CliFrontend.java:1066) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1090) > > In the jobmanager log file i find this exception: > > java.lang.IllegalStateException: Buffer has already been recycled. > at org.apache.flink.shaded.com.google.common.base. > Preconditions.checkState(Preconditions.java:176) > at org.apache.flink.runtime.io.network.buffer.Buffer. > ensureNotRecycled(Buffer.java:131) > at org.apache.flink.runtime.io.network.buffer.Buffer.setSize( > Buffer.java:95) > at org.apache.flink.runtime.io.network.api.serialization. > SpanningRecordSerializer.getCurrentBuffer(SpanningRecordSerializer.java: > 151) > at org.apache.flink.runtime.io.network.api.writer. > RecordWriter.clearBuffers(RecordWriter.java:158) > at org.apache.flink.runtime.operators.RegularPactTask. > clearWriters(RegularPactTask.java:1533) > at org.apache.flink.runtime.operators.RegularPactTask. > invoke(RegularPactTask.java:367) > at org.apache.flink.runtime.execution.RuntimeEnvironment. > run(RuntimeEnvironment.java:204) > at java.lang.Thread.run(Thread.java:745) > > the same exception is in the task manager logs, along with the following > one: > > java.util.concurrent.TimeoutException: Futures timed out after [100 > seconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise. > scala:219) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise. > scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply( > package.scala:107) > at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn( > BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils. > scala:265) > at org.apache.flink.runtime.akka.AkkaUtils.ask(AkkaUtils.scala) > at org.apache.flink.runtime.io.network.partition. > IntermediateResultPartition.scheduleOrUpdateConsumers( > IntermediateResultPartition.java:247) > at org.apache.flink.runtime.io.network.partition. > IntermediateResultPartition.maybeNotifyConsumers( > IntermediateResultPartition.java:240) > at org.apache.flink.runtime.io.network.partition. > IntermediateResultPartition.add(IntermediateResultPartition.java:144) > at org.apache.flink.runtime.io.network.api.writer. > BufferWriter.writeBuffer(BufferWriter.java:74) > at org.apache.flink.runtime.io.network.api.writer. > RecordWriter.emit(RecordWriter.java:91) > at org.apache.flink.runtime.operators.shipping. > OutputCollector.collect(OutputCollector.java:88) > at org.apache.flink.languagebinding.api.java. > common.streaming.Receiver.collectBuffer(Receiver.java:253) > at org.apache.flink.languagebinding.api.java. > common.streaming.Streamer.streamBufferWithoutGroups(Streamer.java:193) > at org.apache.flink.languagebinding.api.java.python.functions. > PythonMapPartition.mapPartition(PythonMapPartition.java:54) > at org.apache.flink.runtime.operators.MapPartitionDriver. > run(MapPartitionDriver.java:98) > at org.apache.flink.runtime.operators.RegularPactTask.run( > RegularPactTask.java:496) > at org.apache.flink.runtime.operators.RegularPactTask. > invoke(RegularPactTask.java:360) > at org.apache.flink.runtime.execution.RuntimeEnvironment. > run(RuntimeEnvironment.java:204) > at java.lang.Thread.run(Thread.java:745) > >