I don’t think it’s a bug of consolidated shuffle, it’s a Linux configuration 
problem. The default open files in Linux is 1024, while your open file is 
larger than 1024 you will get the error as you mentioned below. So you can set 
the open file numbers to a large one by: ulimit –n xxx or write into 
/etc/security/limits.conf in Ubuntu.

Shuffle consolidation can reduce the total shuffle file numbers, but the 
concurrent opened file number is the same as basic hash-based shuffle.

Thanks
Jerry

From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Thursday, July 31, 2014 10:34 AM
To: user@spark.apache.org
Cc: xia...@sjtu.edu.cn
Subject: Re: spark.shuffle.consolidateFiles seems not working

Ok... but my question is why spark.shuffle.consolidateFiles is working (or is 
it)? Is this a bug?

On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao 
<xia...@sjtu.edu.cn<mailto:xia...@sjtu.edu.cn>> wrote:
Hi Jianshi,

I've met similar situation before.
And my solution was 'ulimit', you can use

-a to see your current settings
-n to set open files limit
(and other limits also)

And I set -n to 10240.

I see spark.shuffle.consolidateFiles helps by reusing open files.
(so I don't know to what extend does it help)

Hope it helps.

Larry


On 7/30/14, 4:01 PM, Jianshi Huang wrote:
I'm using Spark 1.0.1 on Yarn-Client mode.

SortByKey always reports a FileNotFoundExceptions with messages says "too many 
open files".

I already set spark.shuffle.consolidateFiles to true:

  conf.set("spark.shuffle.consolidateFiles", "true")

But it seems not working. What are the other possible reasons? How to fix it?

Jianshi

--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/




--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to