and last update for that - The job itself seems to be working and generates output on s3, it just reports itself as KILLED, and history server can't find the logs
On Sun, Jun 14, 2015 at 3:55 PM, Nizan Grauer <[email protected]> wrote: > hi > > update regarding that, hope it will get me some answers... > > When I enter one the workers log (for of its task), I can see the > following exception: > > Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: > ActorSelection[Anchor(akka.tcp://[email protected]:38560/), > Path(/user/CoarseGrainedScheduler)] > at > akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) > at > akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) > at > akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) > at > akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) > at > akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) > at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508) > at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541) > at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531) > at > akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87) > at akka.remote.EndpointWriter.postStop(Endpoint.scala:561) > at akka.actor.Actor$class.aroundPostStop(Actor.scala:475) > at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415) > at > akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) > at > akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) > at akka.actor.ActorCell.terminate(ActorCell.scala:369) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > Is it related? > > thanks, nizan > > On Sun, Jun 14, 2015 at 9:39 AM, nizang <[email protected]> wrote: > >> hi, >> >> I have a running and working cluster with spark 1.3.1, and I tried to >> install a new cluster that is working with spark 1.4.0 >> >> I ran a job on the new 1.4.0 cluster, and the same job on the old 1.3.1 >> cluster >> >> After the job finished (in both clusters), I entered the job in the UI, >> and >> in the new 1.4.0 cluster, the workers are marked as KILLED (I didn't >> killed >> them, and every place I checked, the logs and output seems fine): >> >> 2 worker-20150613111158-172.31.0.104-37240 4 10240 >> KILLED stdout stderr >> 1 worker-20150613111158-172.31.15.149-58710 4 10240 >> KILLED stdout stderr >> 3 worker-20150613111158-172.31.0.196-52939 4 10240 >> KILLED stdout stderr >> 0 worker-20150613111158-172.31.1.233-53467 4 10240 >> KILLED stdout stderr >> >> In the old 1.3.1 cluster, the workers are marked as EXITED: >> >> 1 >> worker-20150608115639-ip-172-31-6-134.us-west-2.compute.internal-47572 2 >> 10240 EXITED stdout stderr >> 0 >> worker-20150608115639-ip-172-31-4-169.us-west-2.compute.internal-41828 2 >> 10240 EXITED stdout stderr >> 2 >> worker-20150608115640-ip-172-31-0-37.us-west-2.compute.internal-32847 1 >> 10240 EXITED stdout stderr >> >> Another thing (which I think is related) is that the history server is not >> working (even though I can see the logs on s3) >> I didn't killed the jobs on the 1.4.0 cluster. The output seems ok, the >> logs >> on s3 seems fine >> >> does anybody have any idea what is wrong here? with the jobs marked as >> KILLED and with the history server >> >> thanks, nizan >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Job-marked-as-killed-in-spark-1-4-tp23305.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
