Unfortunately no. I just removed the persist statements to get the job to run, 
but now it sometimes fails with

Job aborted due to stage failure: Task 162 in stage 2.1 failed 4 times, most 
recent failure: Lost task 162.3 in stage 2.1 (TID 1105, xxx.compute.internal): 
java.io.FileNotFoundException: 
/tmp/spark-local-20150210030009-b4f1/3f/shuffle_4_655_49 (No space left on 
device)
Even though there’s plenty of disk space left.


On 10.02.2015, at 00:09, Muttineni, Vinay <vmuttin...@ebay.com> wrote:

> Hi Marius,
> Did you find a solution to this problem? I get the same error.
> Thanks,
> Vinay
> 
> -----Original Message-----
> From: Marius Soutier [mailto:mps....@gmail.com] 
> Sent: Monday, February 09, 2015 2:19 AM
> To: user
> Subject: Executor Lost with StorageLevel.MEMORY_AND_DISK_SER
> 
> Hi there,
> 
> I'm trying to improve performance on a job that has GC troubles and takes 
> longer to compute simply because it has to recompute failed tasks. After 
> deferring object creation as much as possible, I'm now trying to improve 
> memory usage with StorageLevel.MEMORY_AND_DISK_SER and a custom 
> KryoRegistrator that registers all used classes. This works fine both in unit 
> tests and on a local cluster (i.e. master and worker on my dev machine). On 
> the production cluster this fails without any error message except:
> 
> Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most 
> recent failure: Lost task 10.3 in stage 2.0 (TID 20, xxx.compute.internal): 
> ExecutorLostFailure (executor lost) Driver stacktrace:
> 
> Is there any way to understand what's going on? The logs don't show anything. 
> I'm using Spark 1.1.1.
> 
> 
> Thanks
> - Marius
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
> commands, e-mail: user-h...@spark.apache.org
> 

Reply via email to