Re: How to not write empty RDD partitions in RDD.saveAsTextFile()

2014-10-20 Thread Yi Tian
I think you could use `repartition` to make sure there would be no empty partitions. You could also try `coalesce` to combine partitions , but it can't make sure there are no more empty partitions. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 18, 2014, at 20:30, jan.zi...@centrum

How to not write empty RDD partitions in RDD.saveAsTextFile()

2014-10-18 Thread jan.zikes
Hi, I am developing program using Spark where I am using filter such as:   cleanedData = distData.map(json_extractor.extract_json).filter(lambda x: x != None and x != '') cleanedData.saveAsTextFile(sys.argv[3])     It happens to me that there is saved lot of empty files (probably from those part