HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a
403 Forbidden, then remembered the slash prescription. Can confirm I was
never scrubbing the actual URIs. It looks like it'd all be working now
except it's smacking its head against:

14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
s3n://odesk-bucket/subbucket/2014/01/datafile-01.gz:0+661974299
14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
s3n://odesk-bucket/subbucket/2014/01/datafile-03.gz:0+1207089239
14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
s3n://odesk-bucket/subbucket/2014/01/datafile-06.gz:0+1155725077
14/07/02 23:38:57 ERROR executor.Executor: Exception in task ID 0
java.io.IOException: stored gzip size doesn't match decompressed size
        at
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389)
        at
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224)
        at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
        at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)

but maybe that's just something we need to deal with internally.

Thanks,
--Brian



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/AWS-Credentials-for-private-S3-reads-tp8689p8692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to