Actually, except setting spark.hadoop.validateOutputSpecs to false to disable output validation for the whole program
Spark implementation uses a Dynamic Variable (object PairRDDFunctions) internally to disable it in a case-by-case manner val disableOutputSpecValidation: DynamicVariable[Boolean] = new DynamicVariable[Boolean](false) I’m not sure if there is enough amount of benefits to make it worth exposing this variable to the user… Best, -- Nan Zhu http://codingcat.me On Friday, March 6, 2015 at 10:22 AM, Ted Yu wrote: > Found this thread: > http://search-hadoop.com/m/JW1q5HMrge2 > > Cheers > > On Fri, Mar 6, 2015 at 6:42 AM, Sean Owen <so...@cloudera.com > (mailto:so...@cloudera.com)> wrote: > > This was discussed in the past and viewed as dangerous to enable. The > > biggest problem, by far, comes when you have a job that output M > > partitions, 'overwriting' a directory of data containing N > M old > > partitions. You suddenly have a mix of new and old data. > > > > It doesn't match Hadoop's semantics either, which won't let you do > > this. You can of course simply remove the output directory. > > > > On Fri, Mar 6, 2015 at 2:20 PM, Ted Yu <yuzhih...@gmail.com > > (mailto:yuzhih...@gmail.com)> wrote: > > > Adding support for overwrite flag would make saveAsXXFile more user > > > friendly. > > > > > > Cheers > > > > > > > > > > > >> On Mar 6, 2015, at 2:14 AM, Jeff Zhang <zjf...@gmail.com > > >> (mailto:zjf...@gmail.com)> wrote: > > >> > > >> Hi folks, > > >> > > >> I found that RDD:saveXXFile has no overwrite flag which I think is very > > >> helpful. Is there any reason for this ? > > >> > > >> > > >> > > >> -- > > >> Best Regards > > >> > > >> Jeff Zhang > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > > (mailto:user-unsubscr...@spark.apache.org) > > > For additional commands, e-mail: user-h...@spark.apache.org > > > (mailto:user-h...@spark.apache.org) > > > >