Log shows stack traces that seem to match the assert in JIRA so it seems I
am hitting the issue. Thanks for the heads up ...
15/03/23 20:29:50 ERROR actor.OneForOneStrategy: assertion failed:
Allocator killed more executors than are allocated!
java.lang.AssertionError: assertion failed: Allocator killed more executors
than are allocated!
at scala.Predef$.assert(Predef.scala:179)
at
org.apache.spark.deploy.yarn.YarnAllocator.killExecutor(YarnAllocator.scala:152)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1.applyOrElse(ApplicationMaster.scala:547)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMActor.aroundReceive(ApplicationMaster.scala:506)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
On Mon, Mar 23, 2015 at 2:25 PM, Marcelo Vanzin <[email protected]> wrote:
> On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel <[email protected]>
> wrote:
> > Found the issue above error - the setting for spark_shuffle was
> incomplete.
> >
> > Now it is able to ask and get additional executors. The issue is once
> they
> > are released, it is not able to proceed with next query.
>
> That looks like SPARK-6325, which unfortunately was not fixed in time
> for 1.3.0...
>
> --
> Marcelo
>