+1 for disable copy by default

On 10/02/2015 05:53 PM, Stephan Ewen wrote:
> Hi all!
> 
> Now that we are coming to the next release, I wanted to make sure we
> finalize the decision on that point, because it would be nice to not break
> the behavior of system afterwards.
> 
> Right now, when tasks are chained together, the system copies the elements
> always between different tasks in the same chain.
> 
> I think this policy was established under the assumption that copies do not
> cost anything, given our own test examples, which mainly use immutable
> types like Strings, boxed primitives, ..
> 
> In practice, a lot of data types are actually quite expensive to copy.
> 
> For example, a rather common data type in the event analysis of web-sources
> is JSON Object.
> Flink treats this as a generic type. Depending on its concrete
> implementation, Kryo may have perform a serialization copy, which means
> encoding into bytes (JSON encoding, charset encoding) and decoding again.
> 
> This has a massive impact on the out-of-the-box performance of the system.
> Given that, I was wondering whether we should set to default policy to "not
> copying".
> 
> That is basically the behavior of the batch API, and there has so far never
> been an issue with that (people running into the trap of overwritten
> mutable elements).
> 
> What do you think?
> 
> Stephan
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to