subject:"Re\: Checkpointing in Spark Structured Streaming"

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim

One more thing I missed, commit metadata for the batch N must be written "after" all other parts of the checkpoint are successfully written for the batch N. So you seem to find a way to do asynchronous commit on "custom state store provider" - as I commented before, it's being tied to the task lif

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Rohit Agrawal

Thank you for the reply. For our use case, it's okay to not have exactly-once semantics. Given this use case of not needing exactly-once a) Is there any negative implications if one were to use a custom state store provider which asynchronously committed under the hood b) Is there any other option

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim

I see some points making async checkpoint be tricky to add in micro-batch; one example is "end to end exactly-once", as the commit phase in sink for the batch N can be run "after" the batch N + 1 has been started and write for batch N + 1 can happen before committing batch N. state store checkpoint

Re: Checkpointing in Spark Structured Streaming

Re: Checkpointing in Spark Structured Streaming

Re: Checkpointing in Spark Structured Streaming

3 matches

Site Navigation

Mail list logo

Footer information