There’s a difference between s3:// and s3n:// in the Hadoop S3 access layer. Make sure you use the right one when reading stuff back. In general s3n:// ought to be better because it will create things that look like files in other S3 tools. s3:// was present when the file size limit in S3 was much lower, and it uses S3 objects as blocks in a kind of overlay file system.
If you use s3n:// for both, you should be able to pass the exact same file to load as you did to save. Make sure you also set your AWS keys in the environment or in SparkContext.hadoopConfiguration. Matei On May 6, 2014, at 5:19 PM, kamatsuoka <ken...@gmail.com> wrote: > I have a Spark app that writes out a file, s3://mybucket/mydir/myfile.txt. > > Behind the scenes, the S3 driver creates a bunch of files like > s3://mybucket//mydir/myfile.txt/part-0000, as well as the block files like > s3://mybucket/block_3574186879395643429. > > How do I construct an url to use this file as input to another Spark app? I > tried all the variations of s3://mybucket/mydir/myfile.txt, but none of them > work. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.