Following the documentation on spark-submit,
http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit


   - application-jar: Path to a bundled jar including your application and
   all dependencies. The URL must be globally visible inside of your cluster,
   for instance, an hdfs:// path or a file:// path that is present on all
   nodes.


I submitted a job with the application-jar specified as
s3a://path/to/jar/file/in/s3.jar and the driver didn't do anything. No logs
and no cores / memory taken. I had plenty of both.

I was able to add the hadoop-aws and aws-sdk to the master and the worker's
class paths so they are running with the libraries.

Can someone help me understand how a driver is run on a spark worker? Can
someone help me understand how to get the proper hadoop libraries onto the
path of the driver so that it is able to download and execute a jar file in
s3 ?

Reply via email to