Hi, Gagan Agrawal

In my opinion, I prefer the first.

Here is the reason.

In RocksDB StateBackend, we will serialize the key, namespace, user-key
into a serialized bytes (key-bytes) and serialize user-value to serialized
bytes(value-bytes) then insert  into the key-bytes/value-bytes into
RocksDB, when retrieving from RocksDB we can user get(for a single
key/value) or iterator(for a key range).

If we store four maps into a single MapState, we need to deserialize the
value-bytes(a Map) when we want to retrieve a single user-value.


Gagan Agrawal <agrawalga...@gmail.com> 于2019年1月10日周四 上午10:38写道:

> Hi,
> I have a use case where 4 streams get merged (union) and grouped on common
> key (keyBy) and a custom KeyedProcessFunction is called. Now I need to keep
> state (RocksDB backend) for all 4 streams in my custom KeyedProcessFunction
> where each of these 4 streams would be stored as map. So I have 2 options
>
> 1. Create a separate MapStateDescriptor for each of these streams and
> store their events separately.
> 2. Create a single MapStateDescriptor where there will be only 4 keys
> (corresponding to 4 stream types) and value will be of type Map which
> further keep events from respective streams.
>
> I want to understand from performance perspective, would there be any
> difference in above approaches. Will keeping 4 different MapState cause 4
> lookups for RocksDB backend when they are accessed? Or all of these
> MapStates are internally stored within RocksDB in single row corresponding
> to respective key (as per keyedStream) and hence they are all fetched in
> single call before operator's processElement is called? If there are
> different lookups in RocksDB for each of MapStateDescriptor, then I think
> keeping them in single MapStateDescriptor would be more efficient minimize
> RocksDB calls? Please advise.
>
> Gagan
>


-- 
Best,
Congxian

Reply via email to