Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Michal Haris Fri, 08 May 2015 13:27:07 -0700

+dev
On 6 May 2015 10:45, "Michal Haris" <michal.ha...@visualdna.com> wrote:


> Just wanted to check if somebody has seen similar behaviour or knows what
> we might be doing wrong. We have a relatively complex spark application
> which processes half a terabyte of data at various stages. We have profiled
> it in several ways and everything seems to point to one place where 90% of
> the time is spent:  AppendOnlyMap.changeValue. The job scales and is
> relatively faster than its map-reduce alternative but it still feels slower
> than it should be. I am suspecting too much spill but I haven't seen any
> improvement by increasing number of partitions to 10k. Any idea would be
> appreciated.
>
> --
> Michal Haris
> Technical Architect
> direct line: +44 (0) 207 749 0229
> www.visualdna.com | t: +44 (0) 207 734 7033,
>

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Reply via email to