Hello everyone, 

I'm having some difficulty reading from my company's private S3 buckets. 
I've got an S3 access key and secret key, and I can read the files fine from
a non-Spark Scala routine via AWScala. But trying to read them with the
SparkContext.textFiles([comma separated s3n://bucket/key uris]) leads to the
following stack trace (where I've changed the object key to use terms
'subbucket' and 'datafile-' for privacy reasons: 

[error] (run-main-0) org.apache.hadoop.fs.s3.S3Exception:
org.jets3t.service.S3ServiceException: S3 HEAD request failed for
'/subbucket%2F2014%2F01%2Fdatafile-01.gz' - ResponseCode=403,
ResponseMessage=Forbidden 
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 HEAD request failed for '/subbucket%2F2014%2F01%2Fdatafile-01.gz' -
ResponseCode=403, ResponseMessage=Forbidden 
        at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:122)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
        at java.lang.reflect.Method.invoke(Method.java:483) 
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI 
[... etc ...] 

I'm handing off the credentials themselves via the following method: 

def cleanKey(s: String): String = s.replace("/", "%2F") 

val sc = new SparkContext("local[8]", "SparkS3Test") 

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", 
cleanKey(creds.accessKeyId)) 
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", 
cleanKey(creds.secretAccessKey)) 

The comma-separated URIs themselves look each look like: 

s3n://odesk-bucket-name/subbucket/2014/01/datafile-01.gz 

The actual string that I've replaced with 'subbucket' includes underscores
but otherwise is just straight ASCII; the term 'datafile' is substituting is
also just straight ASCII. 

This is using Spark 1.0.0, via a library dependency to sbt of:
"org.apache.spark" % "spark-core_2.10" % "1.0.0" 

Any tips appeciated! 
Thanks much, 
-Brian 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/AWS-Credentials-for-private-S3-reads-tp8689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to