Submitting jobs to Spark EC2 cluster remotely

olegshirokikh Sun, 22 Feb 2015 23:57:36 -0800

I've set up the EC2 cluster with Spark. Everything works, all master/slaves
are up and running.


I'm trying to submit a sample job (SparkPi). When I ssh to cluster and
submit it from there - everything works fine. However when driver is created
on a remote host (my laptop), it doesn't work. I've tried both modes for
`--deploy-mode`:

**`--deploy-mode=client`:**

>From my laptop:

    ./bin/spark-submit --master
spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class
SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar

Results in the following indefinite warnings/errors:

>  WARN TaskSchedulerImpl: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient memory 15/02/22 18:30:45 

> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
> 15/02/22 18:30:45 

> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1

...and failed drivers - in Spark Web UI "Completed Drivers" with
"State=ERROR" appear.

I've tried to pass limits for cores and memory to submit script but it
didn't help...

**`--deploy-mode=cluster`:**

>From my laptop:

    ./bin/spark-submit --master
spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --deploy-mode
cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar

The result is:

> .... Driver successfully submitted as driver-20150223023734-0007 ...
> waiting before polling master for driver state ... polling master for
> driver state State of driver-20150223023734-0007 is ERROR Exception
> from cluster was: java.io.FileNotFoundException: File
> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
> does not exist. java.io.FileNotFoundException: File
> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
> does not exist.       at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>       at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)        at
> org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
>       at
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75)

 So, I'd appreciate any pointers on what is going wrong and some guidance
how to deploy jobs from remote client. Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-jobs-to-Spark-EC2-cluster-remotely-tp21762.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Submitting jobs to Spark EC2 cluster remotely

Reply via email to