Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
One more thing I missed, commit metadata for the batch N must be written "after" all other parts of the checkpoint are successfully written for the batch N. So you seem to find a way to do asynchronous commit on "custom state store provider" - as I commented before, it's being tied to the task lif

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Rohit Agrawal
Thank you for the reply. For our use case, it's okay to not have exactly-once semantics. Given this use case of not needing exactly-once a) Is there any negative implications if one were to use a custom state store provider which asynchronously committed under the hood b) Is there any other option

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
I see some points making async checkpoint be tricky to add in micro-batch; one example is "end to end exactly-once", as the commit phase in sink for the batch N can be run "after" the batch N + 1 has been started and write for batch N + 1 can happen before committing batch N. state store checkpoint