What I’ve found using saveAsTextFile() against S3 (prior to Spark 1.0.0.)
is that files get overwritten automatically. This is one danger to this
though. If I save to a directory that already has 20 part- files, but this
time around I’m only saving 15 part- files, then there will be 5 leftover
part- files from the previous set mixed in with the 15 newer files. This is
potentially dangerous.

I haven’t checked to see if this behavior has changed in 1.0.0. Are you
saying it has, Pierre?

On Mon, Jun 2, 2014 at 9:41 AM, Pierre B
[pierre.borckm...@realimpactanalytics.com](mailto:pierre.borckm...@realimpactanalytics.com)
<http://mailto:[pierre.borckm...@realimpactanalytics.com](mailto:pierre.borckm...@realimpactanalytics.com)>
wrote:

Hi Michaël,
>
> Thanks for this. We could indeed do that.
>
> But I guess the question is more about the change of behaviour from 0.9.1
> to
> 1.0.0.
> We never had to care about that in previous versions.
>
> Does that mean we have to manually remove existing files or is there a way
> to "aumotically" overwrite when using saveAsTextFile?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-tp6696p6700.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
​

Reply via email to