Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Josh Rosen Fri, 08 May 2015 14:13:37 -0700

Do you have any more specific profiling data that you can share?  I'm
curious to know where AppendOnlyMap.changeValue is being called from.


On Fri, May 8, 2015 at 1:26 PM, Michal Haris <michal.ha...@visualdna.com>
wrote:

> +dev
> On 6 May 2015 10:45, "Michal Haris" <michal.ha...@visualdna.com> wrote:
>
> > Just wanted to check if somebody has seen similar behaviour or knows what
> > we might be doing wrong. We have a relatively complex spark application
> > which processes half a terabyte of data at various stages. We have
> profiled
> > it in several ways and everything seems to point to one place where 90%
> of
> > the time is spent:  AppendOnlyMap.changeValue. The job scales and is
> > relatively faster than its map-reduce alternative but it still feels
> slower
> > than it should be. I am suspecting too much spill but I haven't seen any
> > improvement by increasing number of partitions to 10k. Any idea would be
> > appreciated.
> >
> > --
> > Michal Haris
> > Technical Architect
> > direct line: +44 (0) 207 749 0229
> > www.visualdna.com | t: +44 (0) 207 734 7033,
> >
>

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

Reply via email to