Hi, I have a spark mapreduce task which requires me to write the final rdd to an existing local file (appending to this file). I tried two ways but neither works well:
1. use saveAsTextFile() api. Spark 1.1.0 claims that this API can write to local, but I never make it work. Moreover, the result is not one file but a series of part-xxxxx files which is not what I hope to get. 2. collect the rdd to an array and write it to the driver node using Java's File IO. There are also two problems: 1) my RDD is huge(1TB), which cannot fit into the memory of one driver node. I have to split the task into small pieces and collect them part by part and write; 2) During the writing by Java IO, the Spark Mapreduce task has to wait, which is not efficient. Could anybody provide me an efficient way to solve this problem? I wish that the solution could be like: appending a huge rdd to a local file without pausing the MapReduce during writing? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-write-a-RDD-into-One-Local-Existing-File-tp16720.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org