I assume the idea is for Spark to "rm -r dir/", which would clean out
everything that was there before. It's just doing this instead of the
caller. Hadoop still won't let you write into a location that already
exists regardless, and part of that is for this reason that you might
end up with files mixed-up from different jobs.

This doesn't need a change to Hadoop and probably shouldn't; it's a
change to semantics provided by Spark to do the delete for you if you
set a flag. Viewed that way, meh, seems like the caller could just do
that themselves rather than expand the Spark API (via a utility method
if you like), but I can see it both ways. Caller beware.

On Mon, Jun 2, 2014 at 10:08 PM, Nicholas Chammas
<nicholas.cham...@gmail.com> wrote:
> OK, thanks for confirming. Is there something we can do about that leftover
> part- files problem in Spark, or is that for the Hadoop team?
>
>
> 2014년 6월 2일 월요일, Aaron Davidson<ilike...@gmail.com>님이 작성한 메시지:
>
>> Yes.
>>
>>
>> On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas
>> <nicholas.cham...@gmail.com> wrote:
>>
>> So in summary:
>>
>> As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default.
>> There is an open JIRA issue to add an option to allow clobbering.
>> Even when clobbering, part- files may be left over from previous saves,
>> which is dangerous.
>>
>> Is this correct?

Reply via email to