subject:"Re\: Shuffle files lifecycle"

Re: Shuffle files lifecycle

2015-06-29 Thread Thomas Gerber

Thanks Silvio. On Mon, Jun 29, 2015 at 7:41 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Regarding 1 and 2, yes shuffle output is stored on the worker local > disks and will be reused across jobs as long as they’re available. You can > identify when they’re used by seeing skipp

Re: Shuffle files lifecycle

2015-06-29 Thread Silvio Fiorito

Regarding 1 and 2, yes shuffle output is stored on the worker local disks and will be reused across jobs as long as they’re available. You can identify when they’re used by seeing skipped stages in the job UI. They are periodically cleaned up based on available space of the configured spark.loca

Re: Shuffle files lifecycle

2015-06-29 Thread Thomas Gerber

Ah, for #3, maybe this is what *rdd.checkpoint *does! https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD Thomas On Mon, Jun 29, 2015 at 7:12 PM, Thomas Gerber wrote: > Hello, > > It is my understanding that shuffle are written on disk and that they act > as chec

Re: Shuffle files lifecycle

Re: Shuffle files lifecycle

Re: Shuffle files lifecycle

3 matches

Site Navigation

Mail list logo

Footer information