subject:"Task result is serialized twice by serializer and closure serializer"

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Mingyu Kim

Yep, that makes sense. Thanks for the clarification! Mingyu On 3/4/15, 8:05 PM, "Patrick Wendell" wrote: >Yeah, it will result in a second serialized copy of the array (costing >some memory). But the computational overhead should be very small. The >absolute worst case here will be when doi

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell

Yeah, it will result in a second serialized copy of the array (costing some memory). But the computational overhead should be very small. The absolute worst case here will be when doing a collect() or something similar that just bundles the entire partition. - Patrick On Wed, Mar 4, 2015 at 5:47

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Mingyu Kim

The concern is really just the runtime overhead and memory footprint of Java-serializing an already-serialized byte array again. We originally noticed this when we were using RDD.toLocalIterator() which serializes the entire 64MB partition. We worked around this issue by kryo-serializing and snappy

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell

Hey Mingyu, I think it's broken out separately so we can record the time taken to serialize the result. Once we serializing it once, the second serialization should be really simple since it's just wrapping something that has already been turned into a byte buffer. Do you see a specific issue with

Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Mingyu Kim

Hi all, It looks like the result of task is serialized twice, once by serializer (I.e. Java/Kryo depending on configuration) and once again by closure serializer (I.e. Java). To link the actual code, The first one: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spar

Re: Task result is serialized twice by serializer and closure serializer

Re: Task result is serialized twice by serializer and closure serializer

Re: Task result is serialized twice by serializer and closure serializer

Re: Task result is serialized twice by serializer and closure serializer

Task result is serialized twice by serializer and closure serializer

5 matches

Site Navigation

Mail list logo

Footer information