Re: Writing all values for same key to one file

2016-08-09 Thread neil90
Why not just create a partitions for they key you want to groupby and save it in there? Appending to a file already written to HDFS isn't the best idea IMO. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p2

Re: Writing all values for same key to one file

2016-08-05 Thread colzer
In my opinion,"Append to a file" maybe is not good idea. By using `MultipleTextOutputFormat`, you can append all values for a given key to a directory for example: class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] { override def generateFileNameForKeyValue(ke

Re: Writing all values for same key to one file

2016-08-04 Thread rtijoriwala
Hi Colzer, Thanks for the response. My main question was about writing one file per "key" i.e. have a file with all values for a given key. So in the pseudo code that I have above, am I opening/creating the file in the right place?. Once the file is created and closed, I cannot append to it. Thank

Re: Writing all values for same key to one file

2016-08-04 Thread colzer
for rdd, you can use `saveAsHadoopFile` with a Custom `MultipleOutputFormat` -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27483.html Sent from the Apache Spark User List mailing list archive at Nabble.co

Re: Writing all values for same key to one file

2016-08-04 Thread ayan guha
Partition your data using the key rdd.partitionByKey() On Fri, Aug 5, 2016 at 10:10 AM, rtijoriwala wrote: > Any recommendations? comments? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Writing-all-values-for-same-key-to- > one-file-tp27455p27

Re: Writing all values for same key to one file

2016-08-04 Thread rtijoriwala
Any recommendations? comments? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-all-values-for-same-key-to-one-file-tp27455p27480.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --