Re: batching the output

2014-03-31 Thread Patrick Wendell
Ya this is a good way to do it. On Sun, Mar 30, 2014 at 10:11 PM, Vipul Pandey wrote: > Hi, > > I need to batch the values in my final RDD before writing out to hdfs. The > idea is to batch multiple "rows" in a protobuf and write those batches out > - mostly to save some space as a lot of metad

batching the output

2014-03-30 Thread Vipul Pandey
Hi, I need to batch the values in my final RDD before writing out to hdfs. The idea is to batch multiple "rows" in a protobuf and write those batches out - mostly to save some space as a lot of metadata is the same. e.g. 1,2,3,4,5,6 just batch them (1,2), (3,4),(5,6) and save three records ins