I'm using spark 0.8.1, and trying to run a job from a new remote client (it works fine when run directly from the master).
When I try and run it, the job just fails without doing anything. Unfortunately, I also can't find anywhere were it tells me why it fails. I'll add the bits of the logs below, but there really isn't much. Does anyone know how to tell why it's failing? I assume it must be getting an exception somewhere, but it isn't telling me about it. On the client, I see: 14/02/24 23:44:43 INFO Client$ClientActor: Executor added: app-20140224234441-0003/4 on worker-20140224140443-hadoop-s2.oculus.local-40819 (hadoop-s2.oculus.local:7077) with 32 cores 14/02/24 23:44:43 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140224234441-0003/4 on hostPort hadoop-s2.oculus.local:7077 with 32 cores, 200.0 GB RAM 14/02/24 23:44:43 INFO Client$ClientActor: Executor updated: app-20140224234441-0003/4 is now RUNNING 14/02/24 23:44:43 INFO FileInputFormat: Total input paths to process : 200 14/02/24 23:44:43 INFO Client$ClientActor: Executor updated: app-20140224234441-0003/1 is now FAILED (Command exited with code 1) 14/02/24 23:44:43 INFO SparkDeploySchedulerBackend: Executor app-20140224234441-0003/1 removed: Command exited with code 1 The master log just has: 14/02/24 23:44:43 INFO master.Master: Launching executor app-20140224234441-0003/4 on worker worker-20140224140443-hadoop-s2.oculus.local-40819 14/02/24 23:44:45 INFO master.Master: Removing executor app-20140224234441-0003/4 because it is FAILED (no other mention of 0003/4) The client log has: 14/02/24 23:44:43 INFO worker.Worker: Asked to launch executor app-20140224234441-0003/4 for Pyramid Binning(ndk) 14/02/24 23:44:43 INFO worker.ExecutorRunner: Launch command: "/usr/java/jdk1.7.0_25-cloudera/bin/java" "-cp" "math-utilities-0.2.jar:binning-utilities-0.2.jar:tile-generation-0.2.jar:hbase-client-0.95.2-cdh5.0.0-beta-1.jar:hbase-protocol-0.95.2-cdh5.0.0-beta-1.jar:hbase-common-0.95.2-cdh5.0.0-beta-1.jar:htrace-core-2.01.jar:avro-1.7.4.jar:commons-compress-1.4.1.jar:scala-library-2.9.3.jar:scala-compiler-2.9.3.jar:/opt/spark/conf:spark-assembly-0.8.1-incubating-hadoop2.2.0-mr1-cdh5.0.0-beta-1.jar" "-Dspark.executor.memory=200G" "-Xms204800M" "-Xmx204800M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka://spark@hadoop-client.oculus.local:41101/user/CoarseGrainedScheduler" "4" "hadoop-s2.oculus.local" "32" "app-20140224234441-0003" 14/02/24 23:44:45 INFO worker.Worker: Executor app-20140224234441-0003/4 finished with state FAILED message Command exited with code 1 exitStatus 1 Again, nothing else -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com