Accessing s3a files from Spark

Mayuresh Kunjir Sun, 29 May 2016 14:56:27 -0700

I'm running into permission issues while accessing data in S3 bucket stored
using s3a file system from a local Spark cluster. Has anyone found success
with this?


My setup is:
- Spark 1.6.1 compiled against Hadoop 2.7.2
- aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath
- Spark's Hadoop configuration is as follows:

sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")

sc.hadoopConfiguration.set("fs.s3a.access.key", <access>)

sc.hadoopConfiguration.set("fs.s3a.secret.key", <secret>)

(The secret key does not have any '/' characters which is reported to cause
some issue by others)


I have configured my S3 bucket to grant the necessary permissions. (
https://sparkour.urizone.net/recipes/configuring-s3/)


What works: Listing, reading from, and writing to s3a using hadoop command.
e.g. hadoop dfs -ls s3a://<bucket name>/<file path>


What doesn't work: Reading from s3a using Spark's textFile API. Each task
throws an exception which says *Forbidden Access(403)*.


Some online documents suggest to use IAM roles to grant permissions for an
AWS cluster. But I would like a solution for my local standalone cluster.


Any help would be appreciated.


Regards,

~Mayuresh

Accessing s3a files from Spark

Reply via email to