Hi Maddenpj, Right now the best estimate I've heard for the open file limit is that you'll need the square of the largest partition count in your dataset.
I filed a ticket to log the ulimit value when it's too low at https://issues.apache.org/jira/browse/SPARK-3750 On Mon, Sep 29, 2014 at 6:20 PM, maddenpj <madde...@gmail.com> wrote: > Hey Ameet, > > Thanks for the info, I'm running into the same issue myself and my last > attempt crashed and my ulimit was 16834. I'm going to up it and try again, > but yea I would like to know the best practice for computing this. Can you > talk about the worker nodes, what are their specs? At least 45 gigs of > memory and 6 cores? > > Also I left my worker at the default memory size (512m I think) and gave > all > of the memory to the executor. It was my understanding that the worker just > spawns the executor but all the work is done in the executor. What was your > reasoning for using 24G on the worker? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-memory-requirements-tp4048p15375.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >