Do you have any more specific profiling data that you can share? I'm curious to know where AppendOnlyMap.changeValue is being called from.
On Fri, May 8, 2015 at 1:26 PM, Michal Haris <michal.ha...@visualdna.com> wrote: > +dev > On 6 May 2015 10:45, "Michal Haris" <michal.ha...@visualdna.com> wrote: > > > Just wanted to check if somebody has seen similar behaviour or knows what > > we might be doing wrong. We have a relatively complex spark application > > which processes half a terabyte of data at various stages. We have > profiled > > it in several ways and everything seems to point to one place where 90% > of > > the time is spent: AppendOnlyMap.changeValue. The job scales and is > > relatively faster than its map-reduce alternative but it still feels > slower > > than it should be. I am suspecting too much spill but I haven't seen any > > improvement by increasing number of partitions to 10k. Any idea would be > > appreciated. > > > > -- > > Michal Haris > > Technical Architect > > direct line: +44 (0) 207 749 0229 > > www.visualdna.com | t: +44 (0) 207 734 7033, > > >