Hey guys, Have we disabled the default input copying after all? I don't remember seeing a Jira or PR for this (maybe I just missed it).
And if not, do we want this in the 0.10 release? Cheers, Gyula On Fri, Oct 2, 2015 at 7:57 PM, Till Rohrmann <trohrm...@apache.org> wrote: > Do we know what kind of impact the non-reuse policy has? Maybe the > serialization overhead is subsumed by other effects. > > But in general I'm ok with changing the default to non copying. We just > have to document this feature properly. > On Oct 2, 2015 6:31 PM, "Maximilian Michels" <m...@apache.org> wrote: > > > +1 Good idea. I think we can save quite some CPU cycles by not copying > > records. > > > > That is basically the behavior of the batch API, and there has so far > never > > > been an issue with that (people running into the trap of overwritten > > > mutable elements). > > > > > > As far as I know, this is only the case for chained operators? > > > > On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mj...@apache.org> > wrote: > > > > > +1 for disable copy by default > > > > > > > > > On 10/02/2015 05:53 PM, Stephan Ewen wrote: > > > > Hi all! > > > > > > > > Now that we are coming to the next release, I wanted to make sure we > > > > finalize the decision on that point, because it would be nice to not > > > break > > > > the behavior of system afterwards. > > > > > > > > Right now, when tasks are chained together, the system copies the > > > elements > > > > always between different tasks in the same chain. > > > > > > > > I think this policy was established under the assumption that copies > do > > > not > > > > cost anything, given our own test examples, which mainly use > immutable > > > > types like Strings, boxed primitives, .. > > > > > > > > In practice, a lot of data types are actually quite expensive to > copy. > > > > > > > > For example, a rather common data type in the event analysis of > > > web-sources > > > > is JSON Object. > > > > Flink treats this as a generic type. Depending on its concrete > > > > implementation, Kryo may have perform a serialization copy, which > means > > > > encoding into bytes (JSON encoding, charset encoding) and decoding > > again. > > > > > > > > This has a massive impact on the out-of-the-box performance of the > > > system. > > > > Given that, I was wondering whether we should set to default policy > to > > > "not > > > > copying". > > > > > > > > That is basically the behavior of the batch API, and there has so far > > > never > > > > been an issue with that (people running into the trap of overwritten > > > > mutable elements). > > > > > > > > What do you think? > > > > > > > > Stephan > > > > > > > > > > > > >