Re: How to read a multipart s3 file?

sparkuser2345 Thu, 07 Aug 2014 04:58:38 -0700

Matei Zaharia wrote
> If you use s3n:// for both, you should be able to pass the exact same file
> to load as you did to save.


I'm trying to write a file to s3n in a Spark app and to read it in another
one using the same file name, but without luck. Writing data to s3n as

val data = Array(1.0, 1.0, 1.0)
sc.parallelize(data).saveAsTextFile("s3n://<access_key>:<secret_access_key>@<bucket-name>/test")

creates the following files: 

test/_SUCCESS
test/_temporary/0/task_201408071147_m_000000_$folder$
test/_temporary/0/task_201408071147_m_000000/part-00000
test/_temporary/0/task_201408071147_m_000001_$folder$
test/_temporary/0/task_201408071147_m_000001/part-00001

When trying to read the file as

val data2 =
sc.textFile("s3n://<access_key>:<secret_access_key>@<bucket-name>/test")  

data2 is an empty array:

scala> data2.collect
14/08/07 11:49:56 INFO mapred.FileInputFormat: Total input paths to process
: 0
14/08/07 11:49:56 INFO spark.SparkContext: Starting job: collect at
<console>:15
14/08/07 11:49:56 INFO spark.SparkContext: Job finished: collect at
<console>:15, took 3.7227E-5 s
res5: Array[String] = Array()

I'm using Spark 1.0.0. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p11643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to read a multipart s3 file?

Reply via email to