Re: Submitting jobs to Spark EC2 cluster remotely

Akhil Das Mon, 23 Feb 2015 00:30:07 -0800

Just make sure you meet the following:

1. Set spark.driver.host to your local ip (Where you runs your code, and it
should be accessible from the cluster)


2. Make sure no firewall/router configurations are blocking/filtering the
connection between your laptop and the cluster. Best way to test would be
to ping the laptop's public ip from your cluster. (And if the pinging is
working, then make sure you are portforwaring the required ports)

3. Also set spark.driver.port if you don't want to open up all the ports on
your windows machine (default is random, so stick to one port)


A similar discussion already happened here, you can go through it
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html



Thanks
Best Regards

On Mon, Feb 23, 2015 at 1:25 PM, olegshirokikh <o...@solver.com> wrote:

> I've set up the EC2 cluster with Spark. Everything works, all master/slaves
> are up and running.
>
> I'm trying to submit a sample job (SparkPi). When I ssh to cluster and
> submit it from there - everything works fine. However when driver is
> created
> on a remote host (my laptop), it doesn't work. I've tried both modes for
> `--deploy-mode`:
>
> **`--deploy-mode=client`:**
>
> From my laptop:
>
>     ./bin/spark-submit --master
> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class
> SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>
> Results in the following indefinite warnings/errors:
>
> >  WARN TaskSchedulerImpl: Initial job has not accepted any resources;
> > check your cluster UI to ensure that workers are registered and have
> > sufficient memory 15/02/22 18:30:45
>
> > ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor
> 0
> > 15/02/22 18:30:45
>
> > ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor
> 1
>
> ...and failed drivers - in Spark Web UI "Completed Drivers" with
> "State=ERROR" appear.
>
> I've tried to pass limits for cores and memory to submit script but it
> didn't help...
>
> **`--deploy-mode=cluster`:**
>
> From my laptop:
>
>     ./bin/spark-submit --master
> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077
> --deploy-mode
> cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>
> The result is:
>
> > .... Driver successfully submitted as driver-20150223023734-0007 ...
> > waiting before polling master for driver state ... polling master for
> > driver state State of driver-20150223023734-0007 is ERROR Exception
> > from cluster was: java.io.FileNotFoundException: File
> >
> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
> > does not exist. java.io.FileNotFoundException: File
> >
> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
> > does not exist.       at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
> >       at
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)        at
> > org.apache.spark.deploy.worker.DriverRunner.org
> $apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
> >       at
> >
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75)
>
>  So, I'd appreciate any pointers on what is going wrong and some guidance
> how to deploy jobs from remote client. Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-jobs-to-Spark-EC2-cluster-remotely-tp21762.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Submitting jobs to Spark EC2 cluster remotely

Reply via email to