and last update for that -

The job itself seems to be working and generates output on s3, it just
reports itself as KILLED, and history server can't find the logs

On Sun, Jun 14, 2015 at 3:55 PM, Nizan Grauer <ni...@windward.eu> wrote:

> hi
>
> update regarding that, hope it will get me some answers...
>
> When I enter one the workers log (for of its task), I can see the
> following exception:
>
> Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: 
> ActorSelection[Anchor(akka.tcp://sparkDriver@172.31.0.186:38560/), 
> Path(/user/CoarseGrainedScheduler)]
>       at 
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>       at 
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>       at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>       at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>       at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>       at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>       at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>       at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>       at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>       at 
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>       at 
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>       at 
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>       at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>       at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>       at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
>       at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
>       at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
>       at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
>       at 
> akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
>       at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
>       at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>       at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
>       at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>       at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>       at akka.actor.ActorCell.terminate(ActorCell.scala:369)
>       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
>       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> Is it related?
>
> thanks, nizan
>
> On Sun, Jun 14, 2015 at 9:39 AM, nizang <ni...@windward.eu> wrote:
>
>> hi,
>>
>> I have a running and working cluster with spark 1.3.1, and I tried to
>> install a new cluster that is working with spark 1.4.0
>>
>> I ran a job on the new 1.4.0 cluster, and the same job on the old 1.3.1
>> cluster
>>
>> After the job finished (in both clusters), I entered the job in the UI,
>> and
>> in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't
>> killed
>> them, and every place I checked, the logs and output seems fine):
>>
>> 2       worker-20150613111158-172.31.0.104-37240        4       10240
>>  KILLED  stdout stderr
>> 1       worker-20150613111158-172.31.15.149-58710       4       10240
>>  KILLED  stdout stderr
>> 3       worker-20150613111158-172.31.0.196-52939        4       10240
>>  KILLED  stdout stderr
>> 0       worker-20150613111158-172.31.1.233-53467        4       10240
>>  KILLED  stdout stderr
>>
>> In the old 1.3.1 cluster, the workers are marked as EXITED:
>>
>> 1
>>  worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572  2
>> 10240   EXITED  stdout stderr
>> 0
>>  worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828  2
>> 10240   EXITED  stdout stderr
>> 2
>>  worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847   1
>> 10240   EXITED  stdout stderr
>>
>> Another thing (which I think is related) is that the history server is not
>> working (even though I can see the logs on s3)
>> I didn't killed the jobs on the 1.4.0 cluster. The output seems ok, the
>> logs
>> on s3 seems fine
>>
>> does anybody have any idea what is wrong here? with the jobs marked as
>> KILLED and with the history server
>>
>> thanks, nizan
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Job-marked-as-killed-in-spark-1-4-tp23305.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to