Since we already have "spark.hadoop.validateOutputSpecs" config, I think
there is not much need to expose disableOutputSpecValidation
Cheers
On Fri, Mar 6, 2015 at 7:34 AM, Nan Zhu wrote:
> Actually, except setting spark.hadoop.validateOutputSpecs to false to
> disable output validation for th
Actually, except setting spark.hadoop.validateOutputSpecs to false to disable
output validation for the whole program
Spark implementation uses a Dynamic Variable (object PairRDDFunctions)
internally to disable it in a case-by-case manner
val disableOutputSpecValidation: DynamicVariable[Boole
Found this thread:
http://search-hadoop.com/m/JW1q5HMrge2
Cheers
On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen wrote:
> This was discussed in the past and viewed as dangerous to enable. The
> biggest problem, by far, comes when you have a job that output M
> partitions, 'overwriting' a directory of
This was discussed in the past and viewed as dangerous to enable. The
biggest problem, by far, comes when you have a job that output M
partitions, 'overwriting' a directory of data containing N > M old
partitions. You suddenly have a mix of new and old data.
It doesn't match Hadoop's semantics eit
Adding support for overwrite flag would make saveAsXXFile more user friendly.
Cheers
> On Mar 6, 2015, at 2:14 AM, Jeff Zhang wrote:
>
> Hi folks,
>
> I found that RDD:saveXXFile has no overwrite flag which I think is very
> helpful. Is there any reason for this ?
>
>
>
> --
> Best Reg