Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread fightf...@163.com
Subject: Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets Seeing similar issues, did you find a solution? One would be to increase the number of partitions if you're doing lots of object creation. On Thu, Feb 12, 2015 at 7:26 PM, fightf...@163.com wrot

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread Night Wolf
> fightf...@163.com > > > *From:* Patrick Wendell > *Date:* 2015-02-12 16:12 > *To:* fightf...@163.com > *CC:* user ; dev > *Subject:* Re: Re: Sort Shuffle performance issues about using > AppendOnlyMap for large data sets > The map will start with a capacity of 64,

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread fightf...@163.com
From: Patrick Wendell Date: 2015-02-12 16:12 To: fightf...@163.com CC: user; dev Subject: Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread Patrick Wendell
The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark or are you using Spark SQL's group by? This usually happens if you are grouping or aggregating in a way that doesn't sufficiently condense the data created from each input pa

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-11 Thread fightf...@163.com
Hi, Really have no adequate solution got for this issue. Expecting any available analytical rules or hints. Thanks, Sun. fightf...@163.com From: fightf...@163.com Date: 2015-02-09 11:56 To: user; dev Subject: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets