Hello, I have a use case where I need to read events(non correlated) from a kafka topic, then correlate and push forward to another topic.
I use spark structured streaming with FlatMapGroupsWithStateFunction along with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do some correlation on group events and push forward correlated events to another topic. Non correlated events are stored in the state until they are correlated in a future set of events. There are times where non correlated events are more and the size of non correlated events in the state are growing too big. Does anyone know how to handle this use case or will spark take care of handling state when it grows big? Thanks in advance. regards, Robin Kuttaiah