I am an newbie!! I am running Spark 0.90 in standalone mode on my mac.  The
master and worker run on the same machine.  Both of them startup fine (at
least that is what I see in the log).  

*Upon start-up master log is:*

14/02/26 15:38:08 INFO Slf4jLogger: Slf4jLogger started
14/02/26 15:38:08 INFO Remoting: Starting remoting
14/02/26 15:38:08 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077]
14/02/26 15:38:08 INFO Master: Starting Spark master at
spark://Shirishs-MacBook-Pro.local:7077
14/02/26 15:38:08 INFO MasterWebUI: Started Master web UI at
http://192.168.1.106:8080
14/02/26 15:38:08 INFO Master: I have been elected leader! New state: ALIVE
14/02/26 15:38:22 INFO Master: Registering worker
Shirishs-MacBook-Pro.local:56830 with 4 cores, 15.0 GB RAM

*and the worker log is:*

14/02/26 15:38:21 INFO Slf4jLogger: Slf4jLogger started
14/02/26 15:38:21 INFO Remoting: Starting remoting
14/02/26 15:38:21 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkWorker@192.168.1.106:56830]
14/02/26 15:38:21 INFO Worker: Starting Spark worker 192.168.1.106:56830
with 4 cores, 15.0 GB RAM
14/02/26 15:38:21 INFO Worker: Spark home:
/Users/shirish_kumar/Developer/spark-0.9.0-incubating14/02/26 15:38:22 INFO
WorkerWebUI: Started Worker web UI at http://192.168.1.106:808114/02/26
15:38:22 INFO Worker: Connecting to master
spark://Shirishs-MacBook-Pro.local:7077...14/02/26 15:38:22 INFO Worker:
Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077

When I launch my job using:

./bin/spark-class org.apache.spark.deploy.Client launch
spark://Shirishs-MacBook-Pro.local:7077
file:///Users/shirish_kumar/Developer/spark_app/SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
SimpleApp

*Here is what I see in the master log:*

14/02/26 15:38:36 INFO Master: Driver submitted
org.apache.spark.deploy.worker.DriverWrapper14/02/26 15:38:36 INFO Master:
Launching driver driver-20140226153836-0000 on worker
worker-20140226153821-192.168.1.106-56830
14/02/26 15:38:39 INFO Master: Registering worker
Shirishs-MacBook-Pro.local:56830 with 4 cores, 15.0 GB RAM
14/02/26 15:38:39 INFO Master: Attempted to re-register worker at same
address: akka.tcp://sparkWorker@192.168.1.106:56830
14/02/26 15:38:39 WARN Master: Got heartbeat from unregistered worker
worker-20140226153839-192.168.1.106-56830
14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834
got disassociated, removing it.
14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834
got disassociated, removing it.
14/02/26 15:38:42 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkMaster/deadLetters] to
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.1.106%3A56835-2#330912359]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/26 15:38:42 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] ->
[akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed
with [akka.tcp://driverClient@192.168.1.106:56834]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://driverClient@192.168.1.106:56834]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56834
]       
14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834
got disassociated, removing it.
14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834
got disassociated, removing it.
14/02/26 15:38:42 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] ->
[akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed
with [akka.tcp://driverClient@192.168.1.106:56834]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://driverClient@192.168.1.106:56834]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56834
]
14/02/26 15:38:42 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] ->
[akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed
with [akka.tcp://driverClient@192.168.1.106:56834]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://driverClient@192.168.1.106:56834]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56834
]
14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834
got disassociated, removing it.
14/02/26 15:40:52 WARN Master: Got heartbeat from unregistered worker
worker-20140226153839-192.168.1.106-56830
14/02/26 15:41:09 WARN Master: Got heartbeat from unregistered worker
worker-20140226153839-192.168.1.106-56830

*The worker log is:*

14/02/26 15:38:36 INFO Worker: Asked to launch driver
driver-20140226153836-0000
2014-02-26 15:38:36.790 java[14619:3c0b] Unable to load realm info from
SCDynamicStore
14/02/26 15:38:36 INFO DriverRunner: Copying user jar
file:/Users/shirish_kumar/Developer/spark_app/SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
to
/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140226153836-0000/simple-project_2.10-1.0.jar
14/02/26 15:38:37 INFO DriverRunner: Launch Command:
"/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/bin/java"
"-cp"
":/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140226153836-0000/simple-project_2.10-1.0.jar:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/conf:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar"
"-Dspark.worker.timeout=600" "-Dspark.akka.timeout=200"
"-Dspark.worker.timeout=600" "-Dspark.akka.timeout=200" "-Xms512M"
"-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper"
"akka.tcp://sparkWorker@192.168.1.106:56830/user/Worker" "SimpleApp"
14/02/26 15:38:39 ERROR OneForOneStrategy: FAILED (of class
scala.Enumeration$Val)
scala.MatchError: FAILED (of class scala.Enumeration$Val)
        at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/02/26 15:38:39 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40192.168.1.106%3A56838-2#531095069]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
14/02/26 15:38:39 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@192.168.1.106:56830] ->
[akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with
[akka.tcp://Driver@192.168.1.106:56836]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://Driver@192.168.1.106:56836]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56836
]
14/02/26 15:38:39 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@192.168.1.106:56830] ->
[akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with
[akka.tcp://Driver@192.168.1.106:56836]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://Driver@192.168.1.106:56836]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56836
]
14/02/26 15:38:39 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@192.168.1.106:56830] ->
[akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with
[akka.tcp://Driver@192.168.1.106:56836]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://Driver@192.168.1.106:56836]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: /192.168.1.106:56836
]
14/02/26 15:38:39 INFO Worker: Starting Spark worker 192.168.1.106:56830
with 4 cores, 15.0 GB RAM
14/02/26 15:38:39 INFO Worker: Spark home:
/Users/shirish_kumar/Developer/spark-0.9.0-incubating
14/02/26 15:38:39 INFO WorkerWebUI: Started Worker web UI at
http://192.168.1.106:8081
14/02/26 15:38:39 INFO Worker: Connecting to master
spark://Shirishs-MacBook-Pro.local:7077...
14/02/26 15:38:39 INFO Worker: Successfully registered with master
spark://Shirishs-MacBook-Pro.local:7077


The WebUI (8080) shows the worker as dead and the "new" worker never gets
registered and I can no longer submit any jobs.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/worker-keeps-getting-disassociated-upon-a-failed-job-spark-version-0-90-tp2099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to