Hi,

I have a spark mapreduce task which requires me to write the final rdd to an
existing local file (appending to this file). I tried two ways but neither
works well:

1. use saveAsTextFile() api. Spark 1.1.0 claims that this API can write to
local, but I never make it work. Moreover, the result is not one file but a
series of part-xxxxx files which is not what I hope to get.

2. collect the rdd to an array and write it to the driver node using Java's
File IO. There are also two problems: 1) my RDD is huge(1TB), which cannot
fit into the memory of one driver node. I have to split the task into small
pieces and collect them part by part and write; 2) During the writing by
Java IO, the Spark Mapreduce task has to wait, which is not efficient.

Could anybody provide me an efficient way to solve this problem? I wish that
the solution could be like: appending a huge rdd to a local file without
pausing the MapReduce during writing?






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-write-a-RDD-into-One-Local-Existing-File-tp16720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to