Make sure to set it before you start your SparkContext -- it cannot be changed afterwards. Be warned that there are some known issues with shuffle file consolidation, which should be fixed in 1.1.
On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > I got the number from the Hadoop admin. It's 1M actually. I suspect the > consolidation didn't work as expected? Any other reason? > > > On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <saisai.s...@intel.com> > wrote: > >> I don’t think it’s a bug of consolidated shuffle, it’s a Linux >> configuration problem. The default open files in Linux is 1024, while your >> open file is larger than 1024 you will get the error as you mentioned >> below. So you can set the open file numbers to a large one by: ulimit –n >> xxx or write into /etc/security/limits.conf in Ubuntu. >> >> >> >> Shuffle consolidation can reduce the total shuffle file numbers, but the >> concurrent opened file number is the same as basic hash-based shuffle. >> >> >> >> Thanks >> >> Jerry >> >> >> >> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] >> *Sent:* Thursday, July 31, 2014 10:34 AM >> *To:* user@spark.apache.org >> *Cc:* xia...@sjtu.edu.cn >> *Subject:* Re: spark.shuffle.consolidateFiles seems not working >> >> >> >> Ok... but my question is why spark.shuffle.consolidateFiles is working >> (or is it)? Is this a bug? >> >> >> >> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <xia...@sjtu.edu.cn> wrote: >> >> Hi Jianshi, >> >> I've met similar situation before. >> And my solution was 'ulimit', you can use >> >> -a to see your current settings >> -n to set open files limit >> (and other limits also) >> >> And I set -n to 10240. >> >> I see spark.shuffle.consolidateFiles helps by reusing open files. >> (so I don't know to what extend does it help) >> >> Hope it helps. >> >> Larry >> >> >> >> On 7/30/14, 4:01 PM, Jianshi Huang wrote: >> >> I'm using Spark 1.0.1 on Yarn-Client mode. >> >> SortByKey always reports a FileNotFoundExceptions with messages says "too >> many open files". >> >> I already set spark.shuffle.consolidateFiles to true: >> >> conf.set("spark.shuffle.consolidateFiles", "true") >> >> But it seems not working. What are the other possible reasons? How to fix >> it? >> >> Jianshi >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> >> >> >> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >