-1 for copying objects. Storing a serialized data where possible is good, but copying all objects by default is not a good idea, in my opinion. A lot of scenarios use data types that are hellishly expensive to copy. Even the current copy on chain handover is a problem.
Let's not introduce even more copies. On Mon, Nov 21, 2016 at 3:16 PM, Maciek Próchniak <m...@touk.pl> wrote: > Hi, > > it will come with performance overhead when updating the state, but I > think it'll be possible to perform asynchronous snapshots using > HeapStateBackend (probably some changes to underlying data structures would > be needed) - which would bring more predictable performance. > > thanks, > maciek > > > On 21/11/2016 13:48, Aljoscha Krettek wrote: > >> Hi, >> I would be in favour of this since it brings things in line with the >> RocksDB backend. This will, however, come with quite the performance >> overhead, depending on how fast the TypeSerializer can copy. >> >> Cheers, >> Aljoscha >> >> On Mon, 21 Nov 2016 at 11:30 Fabian Hueske <fhue...@gmail.com> wrote: >> >> Hi everybody, >>> >>> when implementing a ReduceFunction for incremental aggregation of SQL / >>> Table API window aggregates we noticed that the HeapStateBackend does not >>> store copies but holds references to the original objects. In case of a >>> SlidingWindow, the same object is referenced from different window panes. >>> Therefore, it is not possible to modify these objects (in order to avoid >>> object instantiations, see discussion [1]). >>> >>> Other state backends serialize their data such that the behavior is not >>> consistent across backends. >>> If we want to have light-weight tests, we have to create new objects in >>> the >>> ReduceFunction causing unnecessary overhead. >>> >>> I would propose to copy objects when storing them in a HeapStateBackend. >>> This would ensure that objects returned from state to the user behave >>> identical for different state backends. >>> >>> We created a related JIRA [2] that asks to copy records that go into an >>> incremental ReduceFunction. The scope is more narrow and would solve our >>> problem, but would leave the inconsistent behavior of state backends in >>> place. >>> >>> What do others think? >>> >>> Cheers, Fabian >>> >>> [1] https://github.com/apache/flink/pull/2792#discussion_r88653721 >>> [2] https://issues.apache.org/jira/browse/FLINK-5105 >>> >>> >