I see. I'll try spark 1.1.
On Fri, Aug 1, 2014 at 9:58 AM, Aaron Davidson <ilike...@gmail.com> wrote: > Make sure to set it before you start your SparkContext -- it cannot be > changed afterwards. Be warned that there are some known issues with shuffle > file consolidation, which should be fixed in 1.1. > > > On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> I got the number from the Hadoop admin. It's 1M actually. I suspect the >> consolidation didn't work as expected? Any other reason? >> >> >> On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <saisai.s...@intel.com> >> wrote: >> >>> I don’t think it’s a bug of consolidated shuffle, it’s a Linux >>> configuration problem. The default open files in Linux is 1024, while your >>> open file is larger than 1024 you will get the error as you mentioned >>> below. So you can set the open file numbers to a large one by: ulimit –n >>> xxx or write into /etc/security/limits.conf in Ubuntu. >>> >>> >>> >>> Shuffle consolidation can reduce the total shuffle file numbers, but the >>> concurrent opened file number is the same as basic hash-based shuffle. >>> >>> >>> >>> Thanks >>> >>> Jerry >>> >>> >>> >>> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] >>> *Sent:* Thursday, July 31, 2014 10:34 AM >>> *To:* user@spark.apache.org >>> *Cc:* xia...@sjtu.edu.cn >>> *Subject:* Re: spark.shuffle.consolidateFiles seems not working >>> >>> >>> >>> Ok... but my question is why spark.shuffle.consolidateFiles is working >>> (or is it)? Is this a bug? >>> >>> >>> >>> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <xia...@sjtu.edu.cn> wrote: >>> >>> Hi Jianshi, >>> >>> I've met similar situation before. >>> And my solution was 'ulimit', you can use >>> >>> -a to see your current settings >>> -n to set open files limit >>> (and other limits also) >>> >>> And I set -n to 10240. >>> >>> I see spark.shuffle.consolidateFiles helps by reusing open files. >>> (so I don't know to what extend does it help) >>> >>> Hope it helps. >>> >>> Larry >>> >>> >>> >>> On 7/30/14, 4:01 PM, Jianshi Huang wrote: >>> >>> I'm using Spark 1.0.1 on Yarn-Client mode. >>> >>> SortByKey always reports a FileNotFoundExceptions with messages says >>> "too many open files". >>> >>> I already set spark.shuffle.consolidateFiles to true: >>> >>> conf.set("spark.shuffle.consolidateFiles", "true") >>> >>> But it seems not working. What are the other possible reasons? How to >>> fix it? >>> >>> Jianshi >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/