Hi Maddenpj,
Right now the best estimate I've heard for the open file limit is that
you'll need the square of the largest partition count in your dataset.
I filed a ticket to log the ulimit value when it's too low at
https://issues.apache.org/jira/browse/SPARK-3750
On Mon, Sep 29, 2014 at 6:20 P
Hey Ameet,
Thanks for the info, I'm running into the same issue myself and my last
attempt crashed and my ulimit was 16834. I'm going to up it and try again,
but yea I would like to know the best practice for computing this. Can you
talk about the worker nodes, what are their specs? At least 45 gi
A typo - I mean't section 2.1.2.5 "ulimit and nproc" of
https://hbase.apache.org/book.html
Ameet
On Fri, Apr 11, 2014 at 10:32 AM, Ameet Kini wrote:
>
> Turns out that my ulimit settings were too low. I bumped up and the job
> successfully completes. Here's what I have now:
>
> $ ulimit -u
Turns out that my ulimit settings were too low. I bumped up and the job
successfully completes. Here's what I have now:
$ ulimit -u // for max user processes
81920
$ ulimit -n // for open files
81920
I was thrown off by the OutOfMemoryError into thinking it is Spark running
out of memory in t