Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Since we already have "spark.hadoop.validateOutputSpecs" config, I think there is not much need to expose disableOutputSpecValidation Cheers On Fri, Mar 6, 2015 at 7:34 AM, Nan Zhu wrote: > Actually, except setting spark.hadoop.validateOutputSpecs to false to > disable output validation for th

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Nan Zhu
Actually, except setting spark.hadoop.validateOutputSpecs to false to disable output validation for the whole program Spark implementation uses a Dynamic Variable (object PairRDDFunctions) internally to disable it in a case-by-case manner val disableOutputSpecValidation: DynamicVariable[Boole

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Found this thread: http://search-hadoop.com/m/JW1q5HMrge2 Cheers On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen wrote: > This was discussed in the past and viewed as dangerous to enable. The > biggest problem, by far, comes when you have a job that output M > partitions, 'overwriting' a directory of

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Sean Owen
This was discussed in the past and viewed as dangerous to enable. The biggest problem, by far, comes when you have a job that output M partitions, 'overwriting' a directory of data containing N > M old partitions. You suddenly have a mix of new and old data. It doesn't match Hadoop's semantics eit

Re: No overwrite flag for saveAsXXFile

2015-03-06 Thread Ted Yu
Adding support for overwrite flag would make saveAsXXFile more user friendly. Cheers > On Mar 6, 2015, at 2:14 AM, Jeff Zhang wrote: > > Hi folks, > > I found that RDD:saveXXFile has no overwrite flag which I think is very > helpful. Is there any reason for this ? > > > > -- > Best Reg