Indeed, the behavior has changed for good or for bad. I mean, I agree with the 
danger you mention but I'm not sure it's happening like that. Isn't there a 
mechanism for overwrite in Hadoop that automatically removes part files, then 
writes a _temporary folder and then only the part files along with the _success 
folder. 

In any case this change of behavior should be documented IMO.

Cheers 
Pierre

Message sent from a mobile device - excuse typos and abbreviations

> Le 2 juin 2014 à 17:42, Nicholas Chammas <nicholas.cham...@gmail.com> a écrit 
> :
> 
> What I’ve found using saveAsTextFile() against S3 (prior to Spark 1.0.0.) is 
> that files get overwritten automatically. This is one danger to this though. 
> If I save to a directory that already has 20 part- files, but this time 
> around I’m only saving 15 part- files, then there will be 5 leftover part- 
> files from the previous set mixed in with the 15 newer files. This is 
> potentially dangerous.
> 
> I haven’t checked to see if this behavior has changed in 1.0.0. Are you 
> saying it has, Pierre?
> 
>> On Mon, Jun 2, 2014 at 9:41 AM, Pierre B 
>> [pierre.borckm...@realimpactanalytics.com](mailto:pierre.borckm...@realimpactanalytics.com)
>>  wrote:
>> 
>> Hi Michaël,
>> 
>> Thanks for this. We could indeed do that.
>> 
>> But I guess the question is more about the change of behaviour from 0.9.1 to
>> 1.0.0.
>> We never had to care about that in previous versions.
>> 
>> Does that mean we have to manually remove existing files or is there a way
>> to "aumotically" overwrite when using saveAsTextFile?
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-tp6696p6700.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ​

Reply via email to