Re: How to read a multipart s3 file?

Matei Zaharia Tue, 06 May 2014 17:25:22 -0700

There’s a difference between s3:// and s3n:// in the Hadoop S3 access layer. 
Make sure you use the right one when reading stuff back. In general s3n:// 
ought to be better because it will create things that look like files in other 
S3 tools. s3:// was present when the file size limit in S3 was much lower, and 
it uses S3 objects as blocks in a kind of overlay file system.


If you use s3n:// for both, you should be able to pass the exact same file to 
load as you did to save. Make sure you also set your AWS keys in the 
environment or in SparkContext.hadoopConfiguration.

Matei

On May 6, 2014, at 5:19 PM, kamatsuoka <ken...@gmail.com> wrote:

> I have a Spark app that writes out a file, s3://mybucket/mydir/myfile.txt.
> 
> Behind the scenes, the S3 driver creates a bunch of files like
> s3://mybucket//mydir/myfile.txt/part-0000, as well as the block files like
> s3://mybucket/block_3574186879395643429.
> 
> How do I construct an url to use this file as input to another Spark app?  I
> tried all the variations of s3://mybucket/mydir/myfile.txt, but none of them
> work.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to read a multipart s3 file?

Reply via email to