H Robini,

In my experience, the state size of memory state backend is limit by the heap 
memory. See this link for details:
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/state_backends/

“When deciding between HashMapStateBackend and RocksDB, it is a choice between 
performance and scalability. HashMapStateBackend is very fast as each state 
access and update operates on objects on the Java heap; however, state size is 
limited by available memory within the cluster. "

if the size of your window state is really huge, you should choose other state 
backend. 
Hopes my reply would help to you.

Best,
Yuan


> On 19 May 2022, at 9:19 PM, Robin Cassan <robin.cas...@contentsquare.com> 
> wrote:
> 
> Hey all!
> I have a conceptual question on the DataStream API: when using an in-memory 
> state backend (like the HashMapStateBackend), how can you ensure that the 
> hashmap won't grow uncontrollably until OutOfMemory happens?
> 
> In my case, I would be consuming from a Kafka topic, into a SessionWindow. 
> The HashMap state would be accumulating data in memory until the timeout 
> expires, with a PurgingTrigger to clean up the state.
> The cluster's memory would be sized to handle a normal load, but in case of 
> lag or spikes we want the Flink job to slow down its consumption of the kafka 
> topic so that the window's state stays capped at a given size (could be the 
> number of keys or the total Gb). We have tested this scenario, and Flink 
> would consume really quickly from Kafka until memory was so full that it was 
> stuck in GC loops, unable to make progress on the ProcessFunction applied 
> after the window.
> 
> Is there any setting to limit the size of a Window state? Maybe there are 
> some bounded buffers between operators that can be adjusted?
> 
> Thanks a lot for your help!
> Robin

Reply via email to