Re: shuffle memory requirements

2014-09-30 Thread Andrew Ash
Hi Maddenpj, Right now the best estimate I've heard for the open file limit is that you'll need the square of the largest partition count in your dataset. I filed a ticket to log the ulimit value when it's too low at https://issues.apache.org/jira/browse/SPARK-3750 On Mon, Sep 29, 2014 at 6:20 P

Re: shuffle memory requirements

2014-09-29 Thread maddenpj
Hey Ameet, Thanks for the info, I'm running into the same issue myself and my last attempt crashed and my ulimit was 16834. I'm going to up it and try again, but yea I would like to know the best practice for computing this. Can you talk about the worker nodes, what are their specs? At least 45 gi

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
A typo - I mean't section 2.1.2.5 "ulimit and nproc" of https://hbase.apache.org/book.html Ameet On Fri, Apr 11, 2014 at 10:32 AM, Ameet Kini wrote: > > Turns out that my ulimit settings were too low. I bumped up and the job > successfully completes. Here's what I have now: > > $ ulimit -u

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
Turns out that my ulimit settings were too low. I bumped up and the job successfully completes. Here's what I have now: $ ulimit -u // for max user processes 81920 $ ulimit -n // for open files 81920 I was thrown off by the OutOfMemoryError into thinking it is Spark running out of memory in t