Neither of those two. Instead, the shuffle data is cleaned up when the stage they are from get GC'ed by the jvm. that is, when you are no longer holding any references to anything which points to the old stages, and there is an appropriate gc event.
The data is not cleaned up right after the stage completes, because it might get used again by another later (eg., if the stage is retried). On Tue, May 12, 2015 at 6:50 PM, Ashwin Shankar <ashwinshanka...@gmail.com> wrote: > Hi, > In spark on yarn and when running spark_shuffle as auxiliary service on > node manager, does map spills of a stage gets cleaned up once the next > stage completes OR > is it preserved till the app completes(ie waits for all the stages to > complete) ? > > -- > Thanks, > Ashwin > > > >