Hi,

RocksDB is only used as local state store. Operator state is not stored in
RocksDB but only on the TM JVM heap.
When a checkpoint is taken, the keyed state from RocksDB and the operator
state from the heap are both copied to a persistent data store (HDFS, S3,
...).

I was trying to find the documentation that explains how operator state is
managed, but couldn't find it.
I'll create a Jira to fix that.

Best, Fabian

Am Do., 9. Mai 2019 um 16:10 Uhr schrieb an0 <an0...@gmail.com>:

> Thanks, I didn't know that. But it is checkpoints to RocksDB, isn't it?
> BTW, is this special treatment of operator state documented anywhere?
>
> On 2019/05/09 07:39:34, Fabian Hueske <fhue...@gmail.com> wrote:
> > Hi,
> >
> > Yes, IMO it is more clear.
> > However, you should be aware that operator state is maintained on heap
> only
> > (not in RocksDB).
> >
> > Best, Fabian
> >
> >
> > Am Mi., 8. Mai 2019 um 20:44 Uhr schrieb an0 <an0...@gmail.com>:
> >
> > > I switched to using operator list state. It is more clear. It is also
> > > supported by RocksDBKeyedStateBackend, isn't it?
> > >
> > > On 2019/05/08 14:42:36, Till Rohrmann <trohrm...@apache.org> wrote:
> > > > Hi,
> > > >
> > > > if you want to increase the parallelism you could also pick a key
> > > randomly
> > > > from a set of keys. The price you would pay is a shuffle operation
> > > (network
> > > > I/O) which would not be needed if you were using the unkeyed stream
> and
> > > > used the operator list state.
> > > >
> > > > However, with keyed state you could also use Flink's
> > > > RocksDBKeyedStateBackend which allows to go out of core if your state
> > > size
> > > > should grow very large.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, May 7, 2019 at 5:57 PM an0 <an0...@gmail.com> wrote:
> > > >
> > > > > But I only have one stream, nothing to connect it to.
> > > > >
> > > > > On 2019/05/07 00:15:59, Averell <lvhu...@gmail.com> wrote:
> > > > > > From my understanding, having a fake keyBy (stream.keyBy(r =>
> > > > > "dummyString"))
> > > > > > means there would be only one slot handling the data.
> > > > > > Would a broadcast function [1] work for your case?
> > > > > >
> > > > > > Regards,
> > > > > > Averell
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from:
> > > > >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to