subject:"Re\: Shuffle memory woes"

Re: Shuffle memory woes

2016-02-08 Thread Corey Nolet

016 at 6:28 AM, Sea <261810...@qq.com> wrote: >>> >>>> Hi，Corey： >>>>"The dataset is 100gb at most, the spills can up to 10T-100T", Are >>>> your input files lzo format, and you use sc.text() ? If memory is not >>>&g

Re: Shuffle memory woes

2016-02-08 Thread Igor Berman

ey： >>>"The dataset is 100gb at most, the spills can up to 10T-100T", Are >>> your input files lzo format, and you use sc.text() ? If memory is not >>> enough, spark will spill 3-4x of input data to disk. >>> >>> >>>

Re: Shuffle memory woes

2016-02-07 Thread Corey Nolet

： >>"The dataset is 100gb at most, the spills can up to 10T-100T", Are >> your input files lzo format, and you use sc.text() ? If memory is not >> enough, spark will spill 3-4x of input data to disk. >> >> >> -- 原始邮件 --------

Re: Shuffle memory woes

2016-02-07 Thread Charles Chao

is not enough, > spark will spill 3-4x of input data to disk. > > > -- 原始邮件 -- > *发件人:* "Corey Nolet";; > *发送时间:* 2016年2月7日(星期天) 晚上8:56 > *收件人:* "Igor Berman"; > *抄送:* "user"; > *主题:* Re: Shuffle memory woes >

Re: Shuffle memory woes

2016-02-07 Thread Corey Nolet

As for the second part of your questions- we have a fairly complex join process which requires a ton of stage orchestration from our driver. I've written some code to be able to walk down our DAG tree and execute siblings in the tree concurrently where possible (forcing cache to disk on children th

Re: Shuffle memory woes

2016-02-07 Thread Corey Nolet

Igor, I don't think the question is "why can't it fit stuff in memory". I know why it can't fit stuff in memory- because it's a large dataset that needs to have a reduceByKey() run on it. My understanding is that when it doesn't fit into memory it needs to spill in order to consolidate intermediar

Re: Shuffle memory woes

2016-02-07 Thread Igor Berman

so can you provide code snippets: especially it's interesting to see what are your transformation chain, how many partitions are there on each side of shuffle operation the question is why it can't fit stuff in memory when you are shuffling - maybe your partitioner on "reduce" side is not configur

Re: Shuffle memory woes

2016-02-06 Thread Corey Nolet

Igor, Thank you for the response but unfortunately, the problem I'm referring to goes beyond this. I have set the shuffle memory fraction to be 90% and set the cache memory to be 0. Repartitioning the RDD helped a tad on the map side but didn't do much for the spilling when there was no longer any

Re: Shuffle memory woes

2016-02-06 Thread Igor Berman

Hi, usually you can solve this by 2 steps make rdd to have more partitions play with shuffle memory fraction in spark 1.6 cache vs shuffle memory fractions are adjusted automatically On 5 February 2016 at 23:07, Corey Nolet wrote: > I just recently had a discovery that my jobs were taking sever

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

Re: Shuffle memory woes

9 matches

Site Navigation

Mail list logo

Footer information