Hi, I've been trying to run my newly created spark job on my local master instead of just runing it using maven and i haven't been able to make it work. My main issue seems to be related to that error:
14/05/14 09:34:26 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@devsrv:7077] -> [akka.tcp://driverClient@ devsrv .mydomain.priv:50237]: Error [Association failed with [akka.tcp://driverClient@ devsrv . mydomain.priv :50237]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://driverClient@ devsrv . mydomain.priv :50237] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: devsrv . mydomain.priv /172.16.202.246:50237 ] FYI, the port 50237 is always changing so i'm not sure what it's supposed to be. I get this kind of error from many commands including ./bin/spark-class org.apache.spark.deploy.Client kill spark:// devsrv :7077 driver-20140513165819-0001 and ./bin/spark-class org.apache.spark.deploy.Client launch spark:// devsrv :7077 file:///path/to/my/rspark-jobs-1.0.0.0-jar-with-dependencies.jar my.jobs.spark.Indexer I get this error from the kill even if the driver has already finished or does not exist. The launch actually works (or seems to) as i can see my driver appearing on the web UI as SUBMITTED When i then deploy a worker my job starts running and i have no error in it's log. The job, though, never ends and holds after starting to spill on disk. 2014-05-14 09:46:31 INFO BlockFetcherIterator$BasicBlockFetcherIterator:50 - Started 0 remote gets in 45 ms 2014-05-14 09:46:33 WARN ExternalAppendOnlyMap:62 - Spilling in-memory map of 147 MB to disk (1 time so far) 2014-05-14 09:46:33 WARN ExternalAppendOnlyMap:62 - Spilling in-memory map of 130 MB to disk (1 time so far) 2014-05-14 09:46:33 WARN ExternalAppendOnlyMap:62 - Spilling in-memory map of 118 MB to disk (1 time so far) and the worker ends up crashing with those errors: 14/05/14 09:51:23 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/05/14 09:51:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkwor...@devsrv.mydomain.priv:35607] -> [akka.tcp://dri...@devsrv.mydomain.priv:45792]: Error [Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: devsrv.mydomain.priv/172.XXX.XXX.XXX:45792 ] 14/05/14 09:51:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkwor...@devsrv.mydomain.priv:35607] -> [akka.tcp://dri...@devsrv.mydomain.priv:45792]: Error [Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: devsrv.mydomain.priv/172.XXX.XXX.XXX:45792 ] 14/05/14 09:51:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkwor...@devsrv.mydomain.priv:35607] -> [akka.tcp://dri...@devsrv.mydomain.priv:45792]: Error [Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://dri...@devsrv.mydomain.priv:45792] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: devsrv.mydomain.priv/172.XXX.XXX.XXX:45792 I'm almost sure those errors (or at least part of them) have nothing to do with our jar (as we get them even when killing an inexistant driver). We're using spark 0.9.1 for hadoop 1. Any suggestions ? Thanks Regards Laurent