HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a 403 Forbidden, then remembered the slash prescription. Can confirm I was never scrubbing the actual URIs. It looks like it'd all be working now except it's smacking its head against:
14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: s3n://odesk-bucket/subbucket/2014/01/datafile-01.gz:0+661974299 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: s3n://odesk-bucket/subbucket/2014/01/datafile-03.gz:0+1207089239 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: s3n://odesk-bucket/subbucket/2014/01/datafile-06.gz:0+1155725077 14/07/02 23:38:57 ERROR executor.Executor: Exception in task ID 0 java.io.IOException: stored gzip size doesn't match decompressed size at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389) at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76) but maybe that's just something we need to deal with internally. Thanks, --Brian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/AWS-Credentials-for-private-S3-reads-tp8689p8692.html Sent from the Apache Spark User List mailing list archive at Nabble.com.