Amazon also strongly discourages the use of s3:// because the block file system it maps to is deprecated.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html Note > The configuration of Hadoop running on Amazon EMR differs from the default > configuration provided by Apache Hadoop. On Amazon EMR, s3n:// and s3:// > both map to the Amazon S3 native file system, *while in the default > configuration provided by Apache Hadoop s3:// is mapped to the Amazon S3 > block storage system.* Amazon S3 block is a deprecated file system that is not recommended because > it can trigger a race condition that might cause your cluster to fail. It > may be required by legacy applications. On Tue, May 6, 2014 at 8:23 PM, Matei Zaharia <[email protected]>wrote: > There’s a difference between s3:// and s3n:// in the Hadoop S3 access > layer. Make sure you use the right one when reading stuff back. In general > s3n:// ought to be better because it will create things that look like > files in other S3 tools. s3:// was present when the file size limit in S3 > was much lower, and it uses S3 objects as blocks in a kind of overlay file > system. > > If you use s3n:// for both, you should be able to pass the exact same file > to load as you did to save. Make sure you also set your AWS keys in the > environment or in SparkContext.hadoopConfiguration. > > Matei > > On May 6, 2014, at 5:19 PM, kamatsuoka <[email protected]> wrote: > > > I have a Spark app that writes out a file, > s3://mybucket/mydir/myfile.txt. > > > > Behind the scenes, the S3 driver creates a bunch of files like > > s3://mybucket//mydir/myfile.txt/part-0000, as well as the block files > like > > s3://mybucket/block_3574186879395643429. > > > > How do I construct an url to use this file as input to another Spark > app? I > > tried all the variations of s3://mybucket/mydir/myfile.txt, but none of > them > > work. > > > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >
