I am an newbie!! I am running Spark 0.90 in standalone mode on my mac. The master and worker run on the same machine. Both of them startup fine (at least that is what I see in the log).
*Upon start-up master log is:* 14/02/26 15:38:08 INFO Slf4jLogger: Slf4jLogger started 14/02/26 15:38:08 INFO Remoting: Starting remoting 14/02/26 15:38:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] 14/02/26 15:38:08 INFO Master: Starting Spark master at spark://Shirishs-MacBook-Pro.local:7077 14/02/26 15:38:08 INFO MasterWebUI: Started Master web UI at http://192.168.1.106:8080 14/02/26 15:38:08 INFO Master: I have been elected leader! New state: ALIVE 14/02/26 15:38:22 INFO Master: Registering worker Shirishs-MacBook-Pro.local:56830 with 4 cores, 15.0 GB RAM *and the worker log is:* 14/02/26 15:38:21 INFO Slf4jLogger: Slf4jLogger started 14/02/26 15:38:21 INFO Remoting: Starting remoting 14/02/26 15:38:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@192.168.1.106:56830] 14/02/26 15:38:21 INFO Worker: Starting Spark worker 192.168.1.106:56830 with 4 cores, 15.0 GB RAM 14/02/26 15:38:21 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating14/02/26 15:38:22 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:808114/02/26 15:38:22 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077...14/02/26 15:38:22 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077 When I launch my job using: ./bin/spark-class org.apache.spark.deploy.Client launch spark://Shirishs-MacBook-Pro.local:7077 file:///Users/shirish_kumar/Developer/spark_app/SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar SimpleApp *Here is what I see in the master log:* 14/02/26 15:38:36 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper14/02/26 15:38:36 INFO Master: Launching driver driver-20140226153836-0000 on worker worker-20140226153821-192.168.1.106-56830 14/02/26 15:38:39 INFO Master: Registering worker Shirishs-MacBook-Pro.local:56830 with 4 cores, 15.0 GB RAM 14/02/26 15:38:39 INFO Master: Attempted to re-register worker at same address: akka.tcp://sparkWorker@192.168.1.106:56830 14/02/26 15:38:39 WARN Master: Got heartbeat from unregistered worker worker-20140226153839-192.168.1.106-56830 14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834 got disassociated, removing it. 14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834 got disassociated, removing it. 14/02/26 15:38:42 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.1.106%3A56835-2#330912359] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/02/26 15:38:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] -> [akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed with [akka.tcp://driverClient@192.168.1.106:56834]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://driverClient@192.168.1.106:56834] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56834 ] 14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834 got disassociated, removing it. 14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834 got disassociated, removing it. 14/02/26 15:38:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] -> [akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed with [akka.tcp://driverClient@192.168.1.106:56834]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://driverClient@192.168.1.106:56834] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56834 ] 14/02/26 15:38:42 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@Shirishs-MacBook-Pro.local:7077] -> [akka.tcp://driverClient@192.168.1.106:56834]: Error [Association failed with [akka.tcp://driverClient@192.168.1.106:56834]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://driverClient@192.168.1.106:56834] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56834 ] 14/02/26 15:38:42 INFO Master: akka.tcp://driverClient@192.168.1.106:56834 got disassociated, removing it. 14/02/26 15:40:52 WARN Master: Got heartbeat from unregistered worker worker-20140226153839-192.168.1.106-56830 14/02/26 15:41:09 WARN Master: Got heartbeat from unregistered worker worker-20140226153839-192.168.1.106-56830 *The worker log is:* 14/02/26 15:38:36 INFO Worker: Asked to launch driver driver-20140226153836-0000 2014-02-26 15:38:36.790 java[14619:3c0b] Unable to load realm info from SCDynamicStore 14/02/26 15:38:36 INFO DriverRunner: Copying user jar file:/Users/shirish_kumar/Developer/spark_app/SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar to /Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140226153836-0000/simple-project_2.10-1.0.jar 14/02/26 15:38:37 INFO DriverRunner: Launch Command: "/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/bin/java" "-cp" ":/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140226153836-0000/simple-project_2.10-1.0.jar:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/conf:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar" "-Dspark.worker.timeout=600" "-Dspark.akka.timeout=200" "-Dspark.worker.timeout=600" "-Dspark.akka.timeout=200" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp://sparkWorker@192.168.1.106:56830/user/Worker" "SimpleApp" 14/02/26 15:38:39 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/02/26 15:38:39 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40192.168.1.106%3A56838-2#531095069] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/02/26 15:38:39 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@192.168.1.106:56830] -> [akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with [akka.tcp://Driver@192.168.1.106:56836]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://Driver@192.168.1.106:56836] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56836 ] 14/02/26 15:38:39 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@192.168.1.106:56830] -> [akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with [akka.tcp://Driver@192.168.1.106:56836]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://Driver@192.168.1.106:56836] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56836 ] 14/02/26 15:38:39 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@192.168.1.106:56830] -> [akka.tcp://Driver@192.168.1.106:56836]: Error [Association failed with [akka.tcp://Driver@192.168.1.106:56836]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://Driver@192.168.1.106:56836] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:56836 ] 14/02/26 15:38:39 INFO Worker: Starting Spark worker 192.168.1.106:56830 with 4 cores, 15.0 GB RAM 14/02/26 15:38:39 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating 14/02/26 15:38:39 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081 14/02/26 15:38:39 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077... 14/02/26 15:38:39 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077 The WebUI (8080) shows the worker as dead and the "new" worker never gets registered and I can no longer submit any jobs. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/worker-keeps-getting-disassociated-upon-a-failed-job-spark-version-0-90-tp2099.html Sent from the Apache Spark User List mailing list archive at Nabble.com.