+1 for disable copy by default
On 10/02/2015 05:53 PM, Stephan Ewen wrote: > Hi all! > > Now that we are coming to the next release, I wanted to make sure we > finalize the decision on that point, because it would be nice to not break > the behavior of system afterwards. > > Right now, when tasks are chained together, the system copies the elements > always between different tasks in the same chain. > > I think this policy was established under the assumption that copies do not > cost anything, given our own test examples, which mainly use immutable > types like Strings, boxed primitives, .. > > In practice, a lot of data types are actually quite expensive to copy. > > For example, a rather common data type in the event analysis of web-sources > is JSON Object. > Flink treats this as a generic type. Depending on its concrete > implementation, Kryo may have perform a serialization copy, which means > encoding into bytes (JSON encoding, charset encoding) and decoding again. > > This has a massive impact on the out-of-the-box performance of the system. > Given that, I was wondering whether we should set to default policy to "not > copying". > > That is basically the behavior of the batch API, and there has so far never > been an issue with that (people running into the trap of overwritten > mutable elements). > > What do you think? > > Stephan >
signature.asc
Description: OpenPGP digital signature