Hi,

We are running numbers of batch jobs in Flink 1.5 cluster and few of those
are getting stuck at random. These jobs having the following failure after
which operator status changes to CANCELED and stuck to same.

Please find complete TM's log at
https://gist.github.com/imamitjain/066d0e99990ee24f2da1ddc83eba2012


2018-04-29 14:57:24,437 INFO  org.apache.flink.runtime.taskmanager.Task
                 - Producer 98f5976716234236dc69fb0e82a0cc34 of partition
5b8dd5ac2df14266080a8e049defbe38 disposed. Cancelling execution.
org.apache.flink.runtime.jobmanager.PartitionProducerDisposedException:
Execution 98f5976716234236dc69fb0e82a0cc34 producing partition
5b8dd5ac2df14266080a8e049defbe38 has already been disposed.
at
org.apache.flink.runtime.jobmaster.JobMaster.requestPartitionState(JobMaster.java:610)
at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:210)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:69)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Thanks
Amit

Reply via email to