sorry i meant to say SPARK-18980
On Sat, Jan 21, 2017 at 1:48 AM, Koert Kuipers wrote:
> found it :) SPARK-1890
> thanks cloud-fan
>
> On Sat, Jan 21, 2017 at 1:46 AM, Koert Kuipers wrote:
>
>> trying to replicate this in spark itself i can for v2.1.0 but not for
>> master. i guess it has been
found it :) SPARK-1890
thanks cloud-fan
On Sat, Jan 21, 2017 at 1:46 AM, Koert Kuipers wrote:
> trying to replicate this in spark itself i can for v2.1.0 but not for
> master. i guess it has been fixed
>
> On Fri, Jan 20, 2017 at 4:57 PM, Koert Kuipers wrote:
>
>> i started printing out when kr
trying to replicate this in spark itself i can for v2.1.0 but not for
master. i guess it has been fixed
On Fri, Jan 20, 2017 at 4:57 PM, Koert Kuipers wrote:
> i started printing out when kryo serializes my buffer data structure for
> my aggregator.
>
> i would expect every buffer object to idea
i started printing out when kryo serializes my buffer data structure for my
aggregator.
i would expect every buffer object to ideally get serialized only once: at
the end of the map-side before the shuffle (so after all the values for the
given key within the partition have been reduced into it).
we just converted a job from RDD to Dataset. the job does a single map-red
phase using aggregators. we are seeing very bad performance for the Dataset
version, about 10x slower.
in the Dataset version we use kryo encoders for some of the aggregators.
based on some basic profiling of spark in local