You can probably use LazyOutputFormat directly. If there’s one for the hadoop.mapred API, you can use it with PairRDDFunctions.saveAsHadoopRDD() today, otherwise there’s going to be a version of that for the hadoop.mapreduce API as well in Spark 1.0.
Matei On Feb 28, 2014, at 5:18 PM, Mohit Singh <mohit1...@gmail.com> wrote: > Hi, > Is there something equivalent of LazyOutputFormat equivalent in spark > (pyspark) > http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html > Basically, something like where I only save files which has some data in it > rather than saving all the files as in some cases, your majority of files can > be empty? > Thanks > > -- > Mohit > > "When you want success as badly as you want the air, then you will get it. > There is no other secret of success." > -Socrates