I assume the idea is for Spark to "rm -r dir/", which would clean out everything that was there before. It's just doing this instead of the caller. Hadoop still won't let you write into a location that already exists regardless, and part of that is for this reason that you might end up with files mixed-up from different jobs.
This doesn't need a change to Hadoop and probably shouldn't; it's a change to semantics provided by Spark to do the delete for you if you set a flag. Viewed that way, meh, seems like the caller could just do that themselves rather than expand the Spark API (via a utility method if you like), but I can see it both ways. Caller beware. On Mon, Jun 2, 2014 at 10:08 PM, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > OK, thanks for confirming. Is there something we can do about that leftover > part- files problem in Spark, or is that for the Hadoop team? > > > 2014년 6월 2일 월요일, Aaron Davidson<ilike...@gmail.com>님이 작성한 메시지: > >> Yes. >> >> >> On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas >> <nicholas.cham...@gmail.com> wrote: >> >> So in summary: >> >> As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default. >> There is an open JIRA issue to add an option to allow clobbering. >> Even when clobbering, part- files may be left over from previous saves, >> which is dangerous. >> >> Is this correct?