Re: Streaming - memory management

Vinay Patil Wed, 31 Aug 2016 09:24:38 -0700

Hi Stephan,

Just wanted to jump into this discussion regarding state.


So do you mean that if we maintain user-defined state (for non-window
operators), then if we do  not clear it explicitly will the data for that
key remains in RocksDB.

What happens in case of checkpoint ? I read in the documentation that after
the checkpoint happens the rocksDB data is pushed to the desired location
(hdfs or s3 or other fs), so for user-defined state does the data still
remain in RocksDB after checkpoint ?

Correct me if I have misunderstood this concept

For one of our use we were going for this, but since I read the above part
in documentation so we are going for Cassandra now (to store records and
query them for a special case)





Regards,
Vinay Patil

On Wed, Aug 31, 2016 at 4:51 AM, Stephan Ewen <se...@apache.org> wrote:

> In streaming, memory is mainly needed for state (key/value state). The
> exact representation depends on the chosen StateBackend.
>
> State is explicitly released: For windows, state is cleaned up
> automatically (firing / expiry), for user-defined state, keys have to be
> explicitly cleared (clear() method) or in the future will have the option
> to expire.
>
> The heavy work horse for streaming state is currently RocksDB, which
> internally uses native (off-heap) memory to keep the data.
>
> Does that help?
>
> Stephan
>
>
> On Tue, Aug 30, 2016 at 11:52 PM, Roshan Naik <ros...@hortonworks.com>
> wrote:
>
> > As per the docs, in Batch mode, dynamic memory allocation is avoided by
> > storing messages being processed in ByteBuffers via Unsafe methods.
> >
> > Couldn't find any docs  describing mem mgmt in Streamingn mode. So...
> >
> > - Am wondering if this is also the case with Streaming ?
> >
> > - If so, how does Flink detect that an object is no longer being used and
> > can be reclaimed for reuse once again ?
> >
> > -roshan
> >
>

Re: Streaming - memory management

Reply via email to