I had to solve a similar problem, we use a process function with rocksdb and 
map state for the sub keys. So while we hit rocks on every element, only the 
specified sub keys are ever read from disk. 

Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY 
10007swies...@mediamath.com <mailto:fl...@mediamath.com> 
 

On 2/26/18, 6:32 AM, "Marchant, Hayden " <hayden.march...@citi.com> wrote:

    I would like to create a custom aggregator function for a windowed 
KeyedStream which I have complete control over - i.e. instead of implementing 
an AggregatorFunction, I would like to control the lifecycle of the flink state 
by implementing the CheckpointedFunction interface, though I still want this 
state to be per-key, per-window. 
    
    I am not sure which function I should be calling on the WindowedStream in 
order to invoke this custom functionality. I see from the documentation that 
CheckpointedFunction is for non-keyed state - which I guess eliminates this 
option.
    
    A little background - I have logic that needs to hold a very large state in 
the operator - lots of counts by sub-key. Since only a sub-set of these 
aggregations are updated, I was interesting in trying out incremental 
checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on 
every update of state since we need very low latency, and instead wanted to 
hold the state in Java Heap and then update the Flink state on checkpoint - i.e 
something like CheckpointedFunction.
    My assumption is that any update I make to RocksDB backed state will hit 
the local disk - if this is wrong then I'll be happy
    
    What other options do I have?
    
    Thanks,
    Hayden Marchant
    
    

Reply via email to