Matei Zaharia wrote
> If you use s3n:// for both, you should be able to pass the exact same file
> to load as you did to save.
I'm trying to write a file to s3n in a Spark app and to read it in another
one using the same file name, but without luck. Writing data to s3n as
val data = Array(1.0, 1.0, 1.0)
sc.parallelize(data).saveAsTextFile("s3n://<access_key>:<secret_access_key>@<bucket-name>/test")
creates the following files:
test/_SUCCESS
test/_temporary/0/task_201408071147_m_000000_$folder$
test/_temporary/0/task_201408071147_m_000000/part-00000
test/_temporary/0/task_201408071147_m_000001_$folder$
test/_temporary/0/task_201408071147_m_000001/part-00001
When trying to read the file as
val data2 =
sc.textFile("s3n://<access_key>:<secret_access_key>@<bucket-name>/test")
data2 is an empty array:
scala> data2.collect
14/08/07 11:49:56 INFO mapred.FileInputFormat: Total input paths to process
: 0
14/08/07 11:49:56 INFO spark.SparkContext: Starting job: collect at
<console>:15
14/08/07 11:49:56 INFO spark.SparkContext: Job finished: collect at
<console>:15, took 3.7227E-5 s
res5: Array[String] = Array()
I'm using Spark 1.0.0.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p11643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]