hi

update regarding that, hope it will get me some answers...

When I enter one the workers log (for of its task), I can see the following
exception:

Exception in thread "main" akka.actor.ActorNotFound: Actor not found
for: ActorSelection[Anchor(akka.tcp://[email protected]:38560/),
Path(/user/CoarseGrainedScheduler)]
        at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
        at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
        at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
        at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
        at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
        at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
        at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
        at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
        at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
        at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
        at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
        at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
        at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
        at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
        at 
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
        at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
        at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
        at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Is it related?

thanks, nizan

On Sun, Jun 14, 2015 at 9:39 AM, nizang <[email protected]> wrote:

> hi,
>
> I have a running and working cluster with spark 1.3.1, and I tried to
> install a new cluster that is working with spark 1.4.0
>
> I ran a job on the new 1.4.0 cluster, and the same job on the old 1.3.1
> cluster
>
> After the job finished (in both clusters), I entered the job in the UI, and
> in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't killed
> them, and every place I checked, the logs and output seems fine):
>
> 2       worker-20150613111158-172.31.0.104-37240        4       10240
>  KILLED  stdout stderr
> 1       worker-20150613111158-172.31.15.149-58710       4       10240
>  KILLED  stdout stderr
> 3       worker-20150613111158-172.31.0.196-52939        4       10240
>  KILLED  stdout stderr
> 0       worker-20150613111158-172.31.1.233-53467        4       10240
>  KILLED  stdout stderr
>
> In the old 1.3.1 cluster, the workers are marked as EXITED:
>
> 1
>  worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572  2
> 10240   EXITED  stdout stderr
> 0
>  worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828  2
> 10240   EXITED  stdout stderr
> 2
>  worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847   1
> 10240   EXITED  stdout stderr
>
> Another thing (which I think is related) is that the history server is not
> working (even though I can see the logs on s3)
> I didn't killed the jobs on the 1.4.0 cluster. The output seems ok, the
> logs
> on s3 seems fine
>
> does anybody have any idea what is wrong here? with the jobs marked as
> KILLED and with the history server
>
> thanks, nizan
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Job-marked-as-killed-in-spark-1-4-tp23305.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to