Hi Stefan, thanks Yes, we are also using keyed state in other operators the problem is that serialization is quite expensive and in some of them we would prefer to avoid it by storing the state in memory (for our use case one specific operator with in memory state gives at least a 30% throughput improvement). When we are not operating in a keyed stream is easy, basically all the operators have the same in memory state, what we would like to do is the same but when we are operating in a keyed stream. Does it make more sense now?
We are using rocksdb as state backend and as far as I know elements get always serialized when stored in the state and I'm not sure if there is even some disk access (maybe not synchronously) that could hurt performance. Gerard On Tue, Feb 20, 2018 at 2:42 PM, Stefan Richter <s.rich...@data-artisans.com > wrote: > Hi, > > from what I read, I get the impression that you attempt to implement you > own "keyed state" with a hashmap? Why not using the keyed state that is > already provided by Flink and gives you efficient rescaling etc. out of the > box? Please see [1] for the details. > > [1] https://ci.apache.org/projects/flink/flink-docs- > master/dev/stream/state/state.html#using-managed-keyed-state > > Am 20.02.2018 um 13:44 schrieb gerardg <ger...@talaia.io>: > > Hello, > > To improve performance we have " keyed state" in the operator's memory, > basically we keep a Map which contains the state per each of the keys. The > problem comes when we want to restore the state after a failure or after > rescaling the operator. What we are doing is sending the concatenation of > all the state to every operator using an union redistribution and then we > restore the "in memory state" every time we see a new key. Then, after a > while, we just clear the redistributed state. This is somewhat complex and > prone to errors so we would like to find an alternative way of doing this. > > As far as I know Flink knows which keys belong to each operator > (distributing key groups) so I guess it would be possible to somehow > calculate the key id from each of the stored keys and restore the in memory > state at once if we could access to the key groups mapping. Is that > possible? We could patch Flink if necessary to access that information. > > Thanks, > > Gerard > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ > > >