I see. I'll try spark 1.1.
On Fri, Aug 1, 2014 at 9:58 AM, Aaron Davidson <[email protected]> wrote: > Make sure to set it before you start your SparkContext -- it cannot be > changed afterwards. Be warned that there are some known issues with shuffle > file consolidation, which should be fixed in 1.1. > > > On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang <[email protected]> > wrote: > >> I got the number from the Hadoop admin. It's 1M actually. I suspect the >> consolidation didn't work as expected? Any other reason? >> >> >> On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <[email protected]> >> wrote: >> >>> I don’t think it’s a bug of consolidated shuffle, it’s a Linux >>> configuration problem. The default open files in Linux is 1024, while your >>> open file is larger than 1024 you will get the error as you mentioned >>> below. So you can set the open file numbers to a large one by: ulimit –n >>> xxx or write into /etc/security/limits.conf in Ubuntu. >>> >>> >>> >>> Shuffle consolidation can reduce the total shuffle file numbers, but the >>> concurrent opened file number is the same as basic hash-based shuffle. >>> >>> >>> >>> Thanks >>> >>> Jerry >>> >>> >>> >>> *From:* Jianshi Huang [mailto:[email protected]] >>> *Sent:* Thursday, July 31, 2014 10:34 AM >>> *To:* [email protected] >>> *Cc:* [email protected] >>> *Subject:* Re: spark.shuffle.consolidateFiles seems not working >>> >>> >>> >>> Ok... but my question is why spark.shuffle.consolidateFiles is working >>> (or is it)? Is this a bug? >>> >>> >>> >>> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <[email protected]> wrote: >>> >>> Hi Jianshi, >>> >>> I've met similar situation before. >>> And my solution was 'ulimit', you can use >>> >>> -a to see your current settings >>> -n to set open files limit >>> (and other limits also) >>> >>> And I set -n to 10240. >>> >>> I see spark.shuffle.consolidateFiles helps by reusing open files. >>> (so I don't know to what extend does it help) >>> >>> Hope it helps. >>> >>> Larry >>> >>> >>> >>> On 7/30/14, 4:01 PM, Jianshi Huang wrote: >>> >>> I'm using Spark 1.0.1 on Yarn-Client mode. >>> >>> SortByKey always reports a FileNotFoundExceptions with messages says >>> "too many open files". >>> >>> I already set spark.shuffle.consolidateFiles to true: >>> >>> conf.set("spark.shuffle.consolidateFiles", "true") >>> >>> But it seems not working. What are the other possible reasons? How to >>> fix it? >>> >>> Jianshi >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
