Re: I want to use MapState on an unkeyed stream

2019-05-10 Thread an0
Got it, thanks. On 2019/05/10 10:20:40, Fabian Hueske wrote: > Hi, > > RocksDB is only used as local state store. Operator state is not stored in > RocksDB but only on the TM JVM heap. > When a checkpoint is taken, the keyed state from RocksDB and the operator > state from the heap are both cop

Re: I want to use MapState on an unkeyed stream

2019-05-10 Thread Fabian Hueske
Hi, RocksDB is only used as local state store. Operator state is not stored in RocksDB but only on the TM JVM heap. When a checkpoint is taken, the keyed state from RocksDB and the operator state from the heap are both copied to a persistent data store (HDFS, S3, ...). I was trying to find the do

Re: I want to use MapState on an unkeyed stream

2019-05-09 Thread an0
Thanks, I didn't know that. But it is checkpoints to RocksDB, isn't it? BTW, is this special treatment of operator state documented anywhere? On 2019/05/09 07:39:34, Fabian Hueske wrote: > Hi, > > Yes, IMO it is more clear. > However, you should be aware that operator state is maintained on he

Re: I want to use MapState on an unkeyed stream

2019-05-09 Thread Fabian Hueske
Hi, Yes, IMO it is more clear. However, you should be aware that operator state is maintained on heap only (not in RocksDB). Best, Fabian Am Mi., 8. Mai 2019 um 20:44 Uhr schrieb an0 : > I switched to using operator list state. It is more clear. It is also > supported by RocksDBKeyedStateBacke

Re: I want to use MapState on an unkeyed stream

2019-05-08 Thread an0
I switched to using operator list state. It is more clear. It is also supported by RocksDBKeyedStateBackend, isn't it? On 2019/05/08 14:42:36, Till Rohrmann wrote: > Hi, > > if you want to increase the parallelism you could also pick a key randomly > from a set of keys. The price you would pay

Re: I want to use MapState on an unkeyed stream

2019-05-08 Thread Till Rohrmann
Hi, if you want to increase the parallelism you could also pick a key randomly from a set of keys. The price you would pay is a shuffle operation (network I/O) which would not be needed if you were using the unkeyed stream and used the operator list state. However, with keyed state you could also

Re: I want to use MapState on an unkeyed stream

2019-05-07 Thread an0
But I only have one stream, nothing to connect it to. On 2019/05/07 00:15:59, Averell wrote: > From my understanding, having a fake keyBy (stream.keyBy(r => "dummyString")) > means there would be only one slot handling the data. > Would a broadcast function [1] work for your case? > > Regards,

Re: I want to use MapState on an unkeyed stream

2019-05-06 Thread Averell
>From my understanding, having a fake keyBy (stream.keyBy(r => "dummyString")) means there would be only one slot handling the data. Would a broadcast function [1] work for your case? Regards, Averell [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html