Hey all!
I have a conceptual question on the DataStream API: when using an in-memory
state backend (like the HashMapStateBackend), how can you ensure that the
hashmap won't grow uncontrollably until OutOfMemory happens?

In my case, I would be consuming from a Kafka topic, into a SessionWindow.
The HashMap state would be accumulating data in memory until the timeout
expires, with a PurgingTrigger to clean up the state.
The cluster's memory would be sized to handle a normal load, but in case of
lag or spikes we want the Flink job to slow down its consumption of the
kafka topic so that the window's state stays capped at a given size (could
be the number of keys or the total Gb). We have tested this scenario, and
Flink would consume really quickly from Kafka until memory was so full that
it was stuck in GC loops, unable to make progress on the ProcessFunction
applied after the window.

Is there any setting to limit the size of a Window state? Maybe there are
some bounded buffers between operators that can be adjusted?

Thanks a lot for your help!
Robin

Reply via email to