Do you have the JobManager and TaskManager logs of the corresponding TM, by any chance?
On Mon, Jun 29, 2015 at 8:12 PM, Andra Lungu <lungu.an...@gmail.com> wrote: > Something similar in flink-0.10-SNAPSHOT: > > 06/29/2015 10:33:46 CHAIN Join(Join at main(TriangleCount.java:79)) -> > Combine (Reduce at main(TriangleCount.java:79))(222/224) switched to FAILED > java.lang.Exception: The slot in which the task was executed has been > released. Probably loss of TaskManager 57c67d938c9144bec5ba798bb8ebe636 @ > wally025 - 8 slots - URL: akka.tcp:// > flink@130.149.249.35:56135/user/taskmanager > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151) > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) > at > org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) > at > org.apache.flink.runtime.instance.Instance.markDead(Instance.java:154) > at > org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:421) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > 06/29/2015 10:33:46 Job execution switched to status FAILING. > > > On Mon, Jun 29, 2015 at 1:08 PM, Alexander Alexandrov < > alexander.s.alexand...@gmail.com> wrote: > >> I witnessed a similar issue yesterday on a simple job (single task chain, >> no shuffles) with a release-0.9 based fork. >> >> 2015-04-15 14:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> Yes , sorry for that..I found it somewhere in the logs..the problem was >>> that the program didn't die immediately but was somehow hanging and I >>> discovered the source of the problem only running the program on a subset >>> of the data. >>> >>> Thnks for the support, >>> Flavio >>> >>> On Wed, Apr 15, 2015 at 2:56 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>>> This means that the TaskManager was lost. The JobManager can no longer >>>> reach the TaskManager and consists all tasks executing ob the TaskManager >>>> as failed. >>>> >>>> Have a look at the TaskManager log, it should describe why the >>>> TaskManager failed. >>>> Am 15.04.2015 14:45 schrieb "Flavio Pompermaier" <pomperma...@okkam.it >>>> >: >>>> >>>>> Hi to all, >>>>> >>>>> I have this strange error in my job and I don't know what's going on. >>>>> What can I do? >>>>> >>>>> The full exception is: >>>>> >>>>> The slot in which the task was scheduled has been killed (probably >>>>> loss of TaskManager). >>>>> at >>>>> org.apache.flink.runtime.instance.SimpleSlot.cancel(SimpleSlot.java:98) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSimpleSlot(SlotSharingGroupAssignment.java:335) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:319) >>>>> at >>>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:106) >>>>> at >>>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:151) >>>>> at >>>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:435) >>>>> at >>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) >>>>> at >>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) >>>>> at >>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) >>>>> at >>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) >>>>> at >>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) >>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) >>>>> at >>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) >>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> at >>>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> >>>>> >>> >> >