Re: Running out of space (when there's no shortage)

2015-02-27 Thread Kelvin Chu
Hi Joe, you might increase spark.yarn.executor.memoryOverhead to see if it fixes the problem. Please take a look of this report: https://issues.apache.org/jira/browse/SPARK-4996 Hope this helps. On Tue, Feb 24, 2015 at 2:05 PM, Yiannis Gkoufas wrote: > No problem, Joe. There you go > https://is

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Yiannis Gkoufas
No problem, Joe. There you go https://issues.apache.org/jira/browse/SPARK-5081 And also there is this one https://issues.apache.org/jira/browse/SPARK-5715 which is marked as resolved On 24 February 2015 at 21:51, Joe Wass wrote: > Thanks everyone. > > Yiannis, do you know if there's a bug report

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Joe Wass
Thanks everyone. Yiannis, do you know if there's a bug report for this regression? For some other (possibly connected) reason I upgraded from 1.1.1 to 1.2.1, but I can't remember what the bug was. Joe On 24 February 2015 at 19:26, Yiannis Gkoufas wrote: > Hi there, > > I assume you are usin

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Yiannis Gkoufas
Hi there, I assume you are using spark 1.2.1 right? I faced the exact same issue and switched to 1.1.1 with the same configuration and it was solved. On 24 Feb 2015 19:22, "Ted Yu" wrote: > Here is a tool which may give you some clue: > http://file-leak-detector.kohsuke.org/ > > Cheers > > On Tu

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Ted Yu
Here is a tool which may give you some clue: http://file-leak-detector.kohsuke.org/ Cheers On Tue, Feb 24, 2015 at 11:04 AM, Vladimir Rodionov < vrodio...@splicemachine.com> wrote: > Usually it happens in Linux when application deletes file w/o double > checking that there are no open FDs (resou

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Vladimir Rodionov
Usually it happens in Linux when application deletes file w/o double checking that there are no open FDs (resource leak). In this case, Linux holds all space allocated and does not release it until application exits (crashes in your case). You check file system and everything is normal, you have en

Running out of space (when there's no shortage)

2015-02-24 Thread Joe Wass
I'm running a cluster of 3 Amazon EC2 machines (small number because it's expensive when experiments keep crashing after a day!). Today's crash looks like this (stacktrace at end of message). org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 On my thr