Re: [DISCUSS] Hold copies in HeapStateBackend

Stephan Ewen Mon, 21 Nov 2016 06:49:52 -0800

-1 for copying objects.

Storing a serialized data where possible is good, but copying all objects
by default is not a good idea, in my opinion.
A lot of scenarios use data types that are hellishly expensive to copy.
Even the current copy on chain handover is a problem.


Let's not introduce even more copies.

On Mon, Nov 21, 2016 at 3:16 PM, Maciek Próchniak <[email protected]> wrote:

> Hi,
>
> it will come with performance overhead when updating the state, but I
> think it'll be possible to perform asynchronous snapshots using
> HeapStateBackend (probably some changes to underlying data structures would
> be needed) - which would bring more predictable performance.
>
> thanks,
> maciek
>
>
> On 21/11/2016 13:48, Aljoscha Krettek wrote:
>
>> Hi,
>> I would be in favour of this since it brings things in line with the
>> RocksDB backend. This will, however, come with quite the performance
>> overhead, depending on how fast the TypeSerializer can copy.
>>
>> Cheers,
>> Aljoscha
>>
>> On Mon, 21 Nov 2016 at 11:30 Fabian Hueske <[email protected]> wrote:
>>
>> Hi everybody,
>>>
>>> when implementing a ReduceFunction for incremental aggregation of SQL /
>>> Table API window aggregates we noticed that the HeapStateBackend does not
>>> store copies but holds references to the original objects. In case of a
>>> SlidingWindow, the same object is referenced from different window panes.
>>> Therefore, it is not possible to modify these objects (in order to avoid
>>> object instantiations, see discussion [1]).
>>>
>>> Other state backends serialize their data such that the behavior is not
>>> consistent across backends.
>>> If we want to have light-weight tests, we have to create new objects in
>>> the
>>> ReduceFunction causing unnecessary overhead.
>>>
>>> I would propose to copy objects when storing them in a HeapStateBackend.
>>> This would ensure that objects returned from state to the user behave
>>> identical for different state backends.
>>>
>>> We created a related JIRA [2] that asks to copy records that go into an
>>> incremental ReduceFunction. The scope is more narrow and would solve our
>>> problem, but would leave the inconsistent behavior of state backends in
>>> place.
>>>
>>> What do others think?
>>>
>>> Cheers, Fabian
>>>
>>> [1] https://github.com/apache/flink/pull/2792#discussion_r88653721
>>> [2] https://issues.apache.org/jira/browse/FLINK-5105
>>>
>>>
>

Re: [DISCUSS] Hold copies in HeapStateBackend

Reply via email to