Hello,I've got a Spark stand-alone cluster using EC2 instances. I can submit jobs using "--deploy-mode client", however using "--deploy-mode cluster" is proving to be a challenge. I've tries this: spark-submit --class foo --master spark:://master-ip:7077 --deploy-mode cluster s3://bucket/dir/foo.jar When I do this, I get:
16/07/01 16:23:16 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.initialize(Jets3tFileSystemStore.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) Now I'm not using any S3 or hadoop stuff within my code (it's just an sc.parallelize(1 to 100)). So, I imagine it's the driver trying to fetch the jar. I haven't set the AWS Access Key Id and Secret as mentioned, but the role the machine's are in allow them to copy the jar. In other words, this works: aws s3 cp s3://bucket/dir/foo.jar /tmp/foo.jar I'm using Spark 1.6.2, and can't really think of what I can do so that I can submit the jar from s3 using cluster deploy mode. I've also tried simply downloading the jar onto a node, and spark-submitting that... that works in client mode, but I get a not found error when using cluster mode. Any help will be appreciated. Thanks,Ashic.