subject:"Re\: Long\-running job cleanup"

Re: Long-running job cleanup

2014-12-31 Thread Ganelin, Ilya

;, Patrick Wendell mailto:pwend...@gmail.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Long-running job cleanup Hi Patrick, to follow up on the below discussion, I am including a short code snippet that produce

Re: Long-running job cleanup

2014-12-30 Thread Ganelin, Ilya

ailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Long-running job cleanup Hi Patrick - is that cleanup present in 1.1? The overhead I am talking about is with regards to what I believe is shuffle related metadata. If I watch the execution log I see sma

Re: Long-running job cleanup

2014-12-28 Thread Ilya Ganelin

Hi Patrick - is that cleanup present in 1.1? The overhead I am talking about is with regards to what I believe is shuffle related metadata. If I watch the execution log I see small broadcast variables created for every stage of execution, a few KB at a time, and a certain number of MB remaining of

Re: Long-running job cleanup

2014-12-28 Thread Patrick Wendell

What do you mean when you say "the overhead of spark shuffles start to accumulate"? Could you elaborate more? In newer versions of Spark shuffle data is cleaned up automatically when an RDD goes out of scope. It is safe to remove shuffle data at this point because the RDD can no longer be referenc

Re: Long-running job cleanup

2014-12-25 Thread Ilya Ganelin

Hello all - can anyone please offer any advice on this issue? -Ilya Ganelin On Mon, Dec 22, 2014 at 5:36 PM, Ganelin, Ilya wrote: > Hi all, I have a long running job iterating over a huge dataset. Parts of > this operation are cached. Since the job runs for so long, eventually the > overhead of

Re: Long-running job cleanup

Re: Long-running job cleanup

Re: Long-running job cleanup

Re: Long-running job cleanup

Re: Long-running job cleanup

5 matches

Site Navigation

Mail list logo

Footer information