Unfortunately no. I just removed the persist statements to get the job to run, but now it sometimes fails with
Job aborted due to stage failure: Task 162 in stage 2.1 failed 4 times, most recent failure: Lost task 162.3 in stage 2.1 (TID 1105, xxx.compute.internal): java.io.FileNotFoundException: /tmp/spark-local-20150210030009-b4f1/3f/shuffle_4_655_49 (No space left on device) Even though there’s plenty of disk space left. On 10.02.2015, at 00:09, Muttineni, Vinay <vmuttin...@ebay.com> wrote: > Hi Marius, > Did you find a solution to this problem? I get the same error. > Thanks, > Vinay > > -----Original Message----- > From: Marius Soutier [mailto:mps....@gmail.com] > Sent: Monday, February 09, 2015 2:19 AM > To: user > Subject: Executor Lost with StorageLevel.MEMORY_AND_DISK_SER > > Hi there, > > I'm trying to improve performance on a job that has GC troubles and takes > longer to compute simply because it has to recompute failed tasks. After > deferring object creation as much as possible, I'm now trying to improve > memory usage with StorageLevel.MEMORY_AND_DISK_SER and a custom > KryoRegistrator that registers all used classes. This works fine both in unit > tests and on a local cluster (i.e. master and worker on my dev machine). On > the production cluster this fails without any error message except: > > Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most > recent failure: Lost task 10.3 in stage 2.0 (TID 20, xxx.compute.internal): > ExecutorLostFailure (executor lost) Driver stacktrace: > > Is there any way to understand what's going on? The logs don't show anything. > I'm using Spark 1.1.1. > > > Thanks > - Marius > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org >