Re: Open file limit settings for Spark on Yarn job

Sandy Ryza Tue, 10 Feb 2015 16:12:50 -0800

Hi Arun,

The limit for the YARN user on the cluster nodes should be all that
matters.  What version of Spark are you using?  If you can turn on
sort-based shuffle it should solve this problem.


-Sandy

On Tue, Feb 10, 2015 at 1:16 PM, Arun Luthra <arun.lut...@gmail.com> wrote:

> Hi,
>
> I'm running Spark on Yarn from an edge node, and the tasks on the run Data
> Nodes. My job fails with the "Too many open files" error once it gets to
> groupByKey(). Alternatively I can make it fail immediately if I repartition
> the data when I create the RDD.
>
> Where do I need to make sure that ulimit -n is high enough?
>
> On the edge node it is small, 1024, but on the data nodes, the "yarn" user
> has a high limit, 32k. But is the yarn user the relevant user? And, is the
> 1024 limit for myself on the edge node a problem or is that limit not
> relevant?
>
> Arun
>

Re: Open file limit settings for Spark on Yarn job

Reply via email to