Hi Tarandeep, it would be great if you could compile a small example data set with which you're able to reproduce your problem. We could then try to debug it. It would also be interesting to know whether Flavio's bug solves your problem or not.
Cheers, Till On Mon, Oct 3, 2016 at 10:26 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > I think you're running into the same exception I face sometimes..I've > opened a jira for it [1]. Could you please try to apply that patch and see > if things get better? > > https://issues.apache.org/jira/plugins/servlet/mobile#issue/FLINK-4719 > > Best, > Flavio > > On 3 Oct 2016 22:09, "Tarandeep Singh" <tarand...@gmail.com> wrote: > >> Now, when I ran it again (with lower task slots per machine) I got a >> different error- >> >> org.apache.flink.client.program.ProgramInvocationException: The program >> execution failed: Job execution failed. >> at org.apache.flink.client.program.Client.runBlocking(Client. >> java:381) >> at org.apache.flink.client.program.Client.runBlocking(Client. >> java:355) >> at org.apache.flink.client.program.Client.runBlocking(Client. >> java:315) >> at org.apache.flink.client.program.ContextEnvironment.execute( >> ContextEnvironment.java:60) >> at org.apache.flink.api.java.ExecutionEnvironment.execute(Execu >> tionEnvironment.java:855) >> .... >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.flink.client.program.PackagedProgram.callMainMeth >> od(PackagedProgram.java:505) >> at org.apache.flink.client.program.PackagedProgram.invokeIntera >> ctiveModeForExecution(PackagedProgram.java:403) >> at org.apache.flink.client.program.Client.runBlocking(Client. >> java:248) >> at org.apache.flink.client.CliFrontend.executeProgramBlocking(C >> liFrontend.java:866) >> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) >> at org.apache.flink.client.CliFrontend.parseParameters(CliFront >> end.java:1189) >> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239) >> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job >> execution failed. >> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$ >> handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:714) >> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$ >> handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:660) >> at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$ >> handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:660) >> at scala.concurrent.impl.Future$PromiseCompletingRunnable.lifte >> dTree1$1(Future.scala:24) >> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(F >> uture.scala:24) >> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) >> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask. >> exec(AbstractDispatcher.scala:401) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask. >> java:260) >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExec >> All(ForkJoinPool.java:1253) >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java:1346) >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >> l.java:1979) >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >> orkerThread.java:107) >> Caused by: com.esotericsoftware.kryo.KryoException: Unable to find >> class: javaec40-d994-yteBuffer >> at com.esotericsoftware.kryo.util.DefaultClassResolver.readName >> (DefaultClassResolver.java:138) >> at com.esotericsoftware.kryo.util.DefaultClassResolver.readClas >> s(DefaultClassResolver.java:115) >> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) >> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) >> at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSeriali >> zer.deserialize(KryoSerializer.java:228) >> at org.apache.flink.api.java.typeutils.runtime.PojoSerializer. >> deserialize(PojoSerializer.java:431) >> at org.apache.flink.runtime.plugable.NonReusingDeserializationD >> elegate.read(NonReusingDeserializationDelegate.java:55) >> at org.apache.flink.runtime.io.network.api.serialization.Spilli >> ngAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingA >> daptiveSpanningRecordDeserializer.java:124) >> at org.apache.flink.runtime.io.network.api.reader.AbstractRecor >> dReader.getNextRecord(AbstractRecordReader.java:65) >> at org.apache.flink.runtime.io.network.api.reader.MutableRecord >> Reader.next(MutableRecordReader.java:34) >> at org.apache.flink.runtime.operators.util.ReaderIterator.next( >> ReaderIterator.java:73) >> at org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMap >> Driver.java:101) >> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask. >> java:480) >> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTas >> k.java:345) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.ClassNotFoundException: javaec40-d994-yteBuffer >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at com.esotericsoftware.kryo.util.DefaultClassResolver.readName >> (DefaultClassResolver.java:136) >> ... 15 more >> >> >> -Tarandeep >> >> On Mon, Oct 3, 2016 at 12:49 PM, Tarandeep Singh <tarand...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am using flink-1.0.0 and running ETL (batch) jobs on it for quite some >>> time (few months) without any problem. Starting this morning, I have been >>> getting errors like these- >>> >>> "Received an event in channel 3 while still having data from a record. >>> This indicates broken serialization logic. If you are using custom >>> serialization code (Writable or Value types), check their serialization >>> routines. In the case of Kryo, check the respective Kryo serializer." >>> >>> My datasets are in Avro format. The only thing that changed today is - I >>> moved to smaller cluster. When I first ran the ETL jobs, they failed with >>> this error- >>> >>> "Insufficient number of network buffers: required 20, but only 10 >>> available. The total number of network buffers is currently set to 20000. >>> You can increase this number by setting the configuration key >>> 'taskmanager.network.numberOfBuffers'" >>> >>> I increased number of buffers to 30k. Also decreased number of slots per >>> machine as I noticed load per machine was too high. After that, when I >>> restart the jobs, I am getting the above error. >>> >>> Can someone please help me debug it? >>> >>> Thank you, >>> Tarandeep >>> >> >>