Hi,
is there any way to read up on using spark checkpointing (programmaticly) in an in depth manner?
I have an application where I perform multiple operations on a DStream. To my understanding, the result of those Operations would create a new DStream,which can be used for further operations.
Which leads to a chain of Operations which can be described as following:
1) Periodicly read data from Kafka (checkpoint this DStream)
2) Create a window on the read data
3) Aggregate on respective windows (checkpoint this DStream)
4) Write to Kafka (checkpoint DStream)
In the above chain of operations the Stream fails with following error: "WindowedDStream has been marked for checkpointing but the storage level has not been set to enable persisting".
I found out that a WindowedDStream is not supporting persisting data to avoid unnecessary copies of data.
Basicly my issue is that I am not able to use a window operation in a checkpointed enviroment:
- calling checkpoint on all DStreams fails, cause WindowedDStream does not support persisting
- calling checkpoint on all DStreams except WindowedDStream fails, cause somehow it still is marked for checkpointing.
Hopefully someone has an idea
kind regards