H Robini, In my experience, the state size of memory state backend is limit by the heap memory. See this link for details: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/state_backends/
“When deciding between HashMapStateBackend and RocksDB, it is a choice between performance and scalability. HashMapStateBackend is very fast as each state access and update operates on objects on the Java heap; however, state size is limited by available memory within the cluster. " if the size of your window state is really huge, you should choose other state backend. Hopes my reply would help to you. Best, Yuan > On 19 May 2022, at 9:19 PM, Robin Cassan <robin.cas...@contentsquare.com> > wrote: > > Hey all! > I have a conceptual question on the DataStream API: when using an in-memory > state backend (like the HashMapStateBackend), how can you ensure that the > hashmap won't grow uncontrollably until OutOfMemory happens? > > In my case, I would be consuming from a Kafka topic, into a SessionWindow. > The HashMap state would be accumulating data in memory until the timeout > expires, with a PurgingTrigger to clean up the state. > The cluster's memory would be sized to handle a normal load, but in case of > lag or spikes we want the Flink job to slow down its consumption of the kafka > topic so that the window's state stays capped at a given size (could be the > number of keys or the total Gb). We have tested this scenario, and Flink > would consume really quickly from Kafka until memory was so full that it was > stuck in GC loops, unable to make progress on the ProcessFunction applied > after the window. > > Is there any setting to limit the size of a Window state? Maybe there are > some bounded buffers between operators that can be adjusted? > > Thanks a lot for your help! > Robin