Thanks Yu'an for your answer!

Our issue lies in the fact that the window size is variable with the
incoming traffic and would like a solution to avoid filling our window
state in case of spikes. Even with RocksDB we would eventually be limited
by the disk size (which, admittedly, is usually bigger). To simplify our
Flink deployment, we want to avoid attaching volumes to our pods, keeping
our windows in memory is a huge help for maintenance.

It looks like it would theoretically be possible to throttle consumption
from the Kafka source when the state reaches a certain size and I am
wondering if anyone has had a similar use-case and a possible solution to
achieve this.

Thanks again,
Robin

Le ven. 20 mai 2022 à 06:11, yu'an huang <h.yuan...@gmail.com> a écrit :

> H Robini,
>
> In my experience, the state size of memory state backend is limit by the
> heap memory. See this link for details:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/state_backends/
>
> “When deciding between HashMapStateBackend and RocksDB, it is a choice
> between performance and scalability. HashMapStateBackend is very fast as
> each state access and update operates on objects on the Java heap; however,
> state size is limited by available memory within the cluster. "
>
> if the size of your window state is really huge, you should choose other
> state backend.
> Hopes my reply would help to you.
>
> Best,
> Yuan
>
>
> > On 19 May 2022, at 9:19 PM, Robin Cassan <robin.cas...@contentsquare.com>
> wrote:
> >
> > Hey all!
> > I have a conceptual question on the DataStream API: when using an
> in-memory state backend (like the HashMapStateBackend), how can you ensure
> that the hashmap won't grow uncontrollably until OutOfMemory happens?
> >
> > In my case, I would be consuming from a Kafka topic, into a
> SessionWindow. The HashMap state would be accumulating data in memory until
> the timeout expires, with a PurgingTrigger to clean up the state.
> > The cluster's memory would be sized to handle a normal load, but in case
> of lag or spikes we want the Flink job to slow down its consumption of the
> kafka topic so that the window's state stays capped at a given size (could
> be the number of keys or the total Gb). We have tested this scenario, and
> Flink would consume really quickly from Kafka until memory was so full that
> it was stuck in GC loops, unable to make progress on the ProcessFunction
> applied after the window.
> >
> > Is there any setting to limit the size of a Window state? Maybe there
> are some bounded buffers between operators that can be adjusted?
> >
> > Thanks a lot for your help!
> > Robin
>
>

Reply via email to