With Spark Streaming, I am maintaining a state (updateStateByKey every 30s) and
emitting to file parts of that state that have been closed every 5 minutes, but
only care about the last state collected.
In 5m, there will be 10 updateStateByKey iterations called.
For example:
…
val ssc = new StreamingContext(sc, Seconds(30))
val expiredState = state
.filter(_._2.expired == true)
.window(windowDuration = Seconds(30), slideDuration)
…
When I go to emit, I want to update a Boolean flag in my collection of state
that says it has been collected, so that the next time state is updated I can
remove what has been emitted.
Is there a way to do this or maybe a better pattern or approach to solve this
problem?
Hopefully I have given enough information to explain the use case.
Thanks,
Robert