Following the documentation on spark-submit, http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit
- application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes. I submitted a job with the application-jar specified as s3a://path/to/jar/file/in/s3.jar and the driver didn't do anything. No logs and no cores / memory taken. I had plenty of both. I was able to add the hadoop-aws and aws-sdk to the master and the worker's class paths so they are running with the libraries. Can someone help me understand how a driver is run on a spark worker? Can someone help me understand how to get the proper hadoop libraries onto the path of the driver so that it is able to download and execute a jar file in s3 ?