Another approach that would solve the problem for our use case (object re-usage for incremental window ReduceFunctions) would be to copy the first object that is put into the state. This would be a change on the ReduceState, not on the overall state backend, which should be feasible, no?
2016-11-21 15:43 GMT+01:00 Stephan Ewen <se...@apache.org>: > -1 for copying objects. > > Storing a serialized data where possible is good, but copying all objects > by default is not a good idea, in my opinion. > A lot of scenarios use data types that are hellishly expensive to copy. > Even the current copy on chain handover is a problem. > > Let's not introduce even more copies. > > On Mon, Nov 21, 2016 at 3:16 PM, Maciek Próchniak <m...@touk.pl> wrote: > > > Hi, > > > > it will come with performance overhead when updating the state, but I > > think it'll be possible to perform asynchronous snapshots using > > HeapStateBackend (probably some changes to underlying data structures > would > > be needed) - which would bring more predictable performance. > > > > thanks, > > maciek > > > > > > On 21/11/2016 13:48, Aljoscha Krettek wrote: > > > >> Hi, > >> I would be in favour of this since it brings things in line with the > >> RocksDB backend. This will, however, come with quite the performance > >> overhead, depending on how fast the TypeSerializer can copy. > >> > >> Cheers, > >> Aljoscha > >> > >> On Mon, 21 Nov 2016 at 11:30 Fabian Hueske <fhue...@gmail.com> wrote: > >> > >> Hi everybody, > >>> > >>> when implementing a ReduceFunction for incremental aggregation of SQL / > >>> Table API window aggregates we noticed that the HeapStateBackend does > not > >>> store copies but holds references to the original objects. In case of a > >>> SlidingWindow, the same object is referenced from different window > panes. > >>> Therefore, it is not possible to modify these objects (in order to > avoid > >>> object instantiations, see discussion [1]). > >>> > >>> Other state backends serialize their data such that the behavior is not > >>> consistent across backends. > >>> If we want to have light-weight tests, we have to create new objects in > >>> the > >>> ReduceFunction causing unnecessary overhead. > >>> > >>> I would propose to copy objects when storing them in a > HeapStateBackend. > >>> This would ensure that objects returned from state to the user behave > >>> identical for different state backends. > >>> > >>> We created a related JIRA [2] that asks to copy records that go into an > >>> incremental ReduceFunction. The scope is more narrow and would solve > our > >>> problem, but would leave the inconsistent behavior of state backends in > >>> place. > >>> > >>> What do others think? > >>> > >>> Cheers, Fabian > >>> > >>> [1] https://github.com/apache/flink/pull/2792#discussion_r88653721 > >>> [2] https://issues.apache.org/jira/browse/FLINK-5105 > >>> > >>> > > >