Re: spark.shuffle.consolidateFiles seems not working

Aaron Davidson Thu, 31 Jul 2014 18:59:20 -0700

Make sure to set it before you start your SparkContext -- it cannot be
changed afterwards. Be warned that there are some known issues with shuffle
file consolidation, which should be fixed in 1.1.



On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> I got the number from the Hadoop admin. It's 1M actually. I suspect the
> consolidation didn't work as expected? Any other reason?
>
>
> On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <saisai.s...@intel.com>
> wrote:
>
>>  I don’t think it’s a bug of consolidated shuffle, it’s a Linux
>> configuration problem. The default open files in Linux is 1024, while your
>> open file is larger than 1024 you will get the error as you mentioned
>> below. So you can set the open file numbers to a large one by: ulimit –n
>> xxx or write into /etc/security/limits.conf in Ubuntu.
>>
>>
>>
>> Shuffle consolidation can reduce the total shuffle file numbers, but the
>> concurrent opened file number is the same as basic hash-based shuffle.
>>
>>
>>
>> Thanks
>>
>> Jerry
>>
>>
>>
>> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
>> *Sent:* Thursday, July 31, 2014 10:34 AM
>> *To:* user@spark.apache.org
>> *Cc:* xia...@sjtu.edu.cn
>> *Subject:* Re: spark.shuffle.consolidateFiles seems not working
>>
>>
>>
>> Ok... but my question is why spark.shuffle.consolidateFiles is working
>> (or is it)? Is this a bug?
>>
>>
>>
>> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <xia...@sjtu.edu.cn> wrote:
>>
>> Hi Jianshi,
>>
>> I've met similar situation before.
>> And my solution was 'ulimit', you can use
>>
>> -a to see your current settings
>> -n to set open files limit
>> (and other limits also)
>>
>> And I set -n to 10240.
>>
>> I see spark.shuffle.consolidateFiles helps by reusing open files.
>> (so I don't know to what extend does it help)
>>
>> Hope it helps.
>>
>> Larry
>>
>>
>>
>> On 7/30/14, 4:01 PM, Jianshi Huang wrote:
>>
>> I'm using Spark 1.0.1 on Yarn-Client mode.
>>
>> SortByKey always reports a FileNotFoundExceptions with messages says "too
>> many open files".
>>
>> I already set spark.shuffle.consolidateFiles to true:
>>
>>   conf.set("spark.shuffle.consolidateFiles", "true")
>>
>> But it seems not working. What are the other possible reasons? How to fix
>> it?
>>
>> Jianshi
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>>
>>
>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Re: spark.shuffle.consolidateFiles seems not working

Reply via email to