Hey all! I have a conceptual question on the DataStream API: when using an in-memory state backend (like the HashMapStateBackend), how can you ensure that the hashmap won't grow uncontrollably until OutOfMemory happens?
In my case, I would be consuming from a Kafka topic, into a SessionWindow. The HashMap state would be accumulating data in memory until the timeout expires, with a PurgingTrigger to clean up the state. The cluster's memory would be sized to handle a normal load, but in case of lag or spikes we want the Flink job to slow down its consumption of the kafka topic so that the window's state stays capped at a given size (could be the number of keys or the total Gb). We have tested this scenario, and Flink would consume really quickly from Kafka until memory was so full that it was stuck in GC loops, unable to make progress on the ProcessFunction applied after the window. Is there any setting to limit the size of a Window state? Maybe there are some bounded buffers between operators that can be adjusted? Thanks a lot for your help! Robin