Why not just create a partitions for they key you want to groupby and save it
in there? Appending to a file already written to HDFS isn't the best idea
IMO.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p2
In my opinion,"Append to a file" maybe is not good idea.
By using `MultipleTextOutputFormat`, you can append all values for a given
key to a directory
for example:
class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any,
Any] {
override def generateFileNameForKeyValue(ke
Hi Colzer,
Thanks for the response. My main question was about writing one file per
"key" i.e. have a file with all values for a given key. So in the pseudo
code that I have above, am I opening/creating the file in the right place?.
Once the file is created and closed, I cannot append to it.
Thank
for rdd, you can use `saveAsHadoopFile` with a Custom `MultipleOutputFormat`
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27483.html
Sent from the Apache Spark User List mailing list archive at Nabble.co
Partition your data using the key
rdd.partitionByKey()
On Fri, Aug 5, 2016 at 10:10 AM, rtijoriwala
wrote:
> Any recommendations? comments?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Writing-all-values-for-same-key-to-
> one-file-tp27455p27
Any recommendations? comments?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27480.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--