Hello,

I have a use case where I need to read events(non correlated) from a kafka
topic, then correlate and push forward to another topic.

I use spark structured streaming with FlatMapGroupsWithStateFunction along
with   GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do
some correlation on group events and push forward correlated events to
another topic. Non correlated events are stored in the state until they are
correlated in a future set of events.

There are times where non correlated events are more and the size of non
correlated events in the state are growing too big.

Does anyone know how to handle this use case or will spark take care of
handling state when it grows big?

Thanks in advance.

regards,
Robin Kuttaiah

Reply via email to