Yep, that makes sense. Thanks for the clarification!
Mingyu
On 3/4/15, 8:05 PM, "Patrick Wendell" wrote:
>Yeah, it will result in a second serialized copy of the array (costing
>some memory). But the computational overhead should be very small. The
>absolute worst case here will be when doi
Yeah, it will result in a second serialized copy of the array (costing
some memory). But the computational overhead should be very small. The
absolute worst case here will be when doing a collect() or something
similar that just bundles the entire partition.
- Patrick
On Wed, Mar 4, 2015 at 5:47
The concern is really just the runtime overhead and memory footprint of
Java-serializing an already-serialized byte array again. We originally
noticed this when we were using RDD.toLocalIterator() which serializes the
entire 64MB partition. We worked around this issue by kryo-serializing and
snappy
Hey Mingyu,
I think it's broken out separately so we can record the time taken to
serialize the result. Once we serializing it once, the second
serialization should be really simple since it's just wrapping
something that has already been turned into a byte buffer. Do you see
a specific issue with
Hi all,
It looks like the result of task is serialized twice, once by serializer (I.e.
Java/Kryo depending on configuration) and once again by closure serializer
(I.e. Java). To link the actual code,
The first one:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spar