Checkpointing in foreachPartition in Spark batck

Abhishek Singla Wed, 16 Apr 2025 06:01:12 -0700

Hi Team,

We are using foreachPartition to send dataset row data to third system via
HTTP client. The operation is not idempotent. I wanna ensure that in case
of failures the previously processed dataset should not get processed
again.


Is there a way to checkpoint in Spark batch
1. checkpoint processed partitions so that if there are 1000 partitions and
100 were processed in the previous batch, they should not get processed
again.
2. checkpoint partial partition in foreachPartition, if I have processed
100 records from a partition which have 1000 total records, is there a way
to checkpoint offset so that those 100 should not get processed again.

Regards,
Abhishek Singla

Checkpointing in foreachPartition in Spark batck

Reply via email to