Do we know what kind of impact the non-reuse policy has? Maybe the serialization overhead is subsumed by other effects.
But in general I'm ok with changing the default to non copying. We just have to document this feature properly. On Oct 2, 2015 6:31 PM, "Maximilian Michels" <m...@apache.org> wrote: > +1 Good idea. I think we can save quite some CPU cycles by not copying > records. > > That is basically the behavior of the batch API, and there has so far never > > been an issue with that (people running into the trap of overwritten > > mutable elements). > > > As far as I know, this is only the case for chained operators? > > On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mj...@apache.org> wrote: > > > +1 for disable copy by default > > > > > > On 10/02/2015 05:53 PM, Stephan Ewen wrote: > > > Hi all! > > > > > > Now that we are coming to the next release, I wanted to make sure we > > > finalize the decision on that point, because it would be nice to not > > break > > > the behavior of system afterwards. > > > > > > Right now, when tasks are chained together, the system copies the > > elements > > > always between different tasks in the same chain. > > > > > > I think this policy was established under the assumption that copies do > > not > > > cost anything, given our own test examples, which mainly use immutable > > > types like Strings, boxed primitives, .. > > > > > > In practice, a lot of data types are actually quite expensive to copy. > > > > > > For example, a rather common data type in the event analysis of > > web-sources > > > is JSON Object. > > > Flink treats this as a generic type. Depending on its concrete > > > implementation, Kryo may have perform a serialization copy, which means > > > encoding into bytes (JSON encoding, charset encoding) and decoding > again. > > > > > > This has a massive impact on the out-of-the-box performance of the > > system. > > > Given that, I was wondering whether we should set to default policy to > > "not > > > copying". > > > > > > That is basically the behavior of the batch API, and there has so far > > never > > > been an issue with that (people running into the trap of overwritten > > > mutable elements). > > > > > > What do you think? > > > > > > Stephan > > > > > > > >