Re: dataset aggregators with kryo encoder very slow

2017-01-21 Thread Koert Kuipers
sorry i meant to say SPARK-18980 On Sat, Jan 21, 2017 at 1:48 AM, Koert Kuipers wrote: > found it :) SPARK-1890 > thanks cloud-fan > > On Sat, Jan 21, 2017 at 1:46 AM, Koert Kuipers wrote: > >> trying to replicate this in spark itself i can for v2.1.0 but not for >> master. i guess it has been

Re: dataset aggregators with kryo encoder very slow

2017-01-20 Thread Koert Kuipers
found it :) SPARK-1890 thanks cloud-fan On Sat, Jan 21, 2017 at 1:46 AM, Koert Kuipers wrote: > trying to replicate this in spark itself i can for v2.1.0 but not for > master. i guess it has been fixed > > On Fri, Jan 20, 2017 at 4:57 PM, Koert Kuipers wrote: > >> i started printing out when kr

Re: dataset aggregators with kryo encoder very slow

2017-01-20 Thread Koert Kuipers
trying to replicate this in spark itself i can for v2.1.0 but not for master. i guess it has been fixed On Fri, Jan 20, 2017 at 4:57 PM, Koert Kuipers wrote: > i started printing out when kryo serializes my buffer data structure for > my aggregator. > > i would expect every buffer object to idea

Re: dataset aggregators with kryo encoder very slow

2017-01-20 Thread Koert Kuipers
i started printing out when kryo serializes my buffer data structure for my aggregator. i would expect every buffer object to ideally get serialized only once: at the end of the map-side before the shuffle (so after all the values for the given key within the partition have been reduced into it).

dataset aggregators with kryo encoder very slow

2017-01-19 Thread Koert Kuipers
we just converted a job from RDD to Dataset. the job does a single map-red phase using aggregators. we are seeing very bad performance for the Dataset version, about 10x slower. in the Dataset version we use kryo encoders for some of the aggregators. based on some basic profiling of spark in local