Re: Checkpointing in foreachPartition in Spark batck

Ángel Álvarez Pascua Wed, 16 Apr 2025 07:09:37 -0700

What about returning the HTTP codes as a dataframe result in the
foreachPartition, saving it to files/table/whatever and then performing a
join to discard already ok processed rows when you it try again?


El mié, 16 abr 2025 a las 15:01, Abhishek Singla (<
abhisheksingla...@gmail.com>) escribió:

> Hi Team,
>
> We are using foreachPartition to send dataset row data to third system via
> HTTP client. The operation is not idempotent. I wanna ensure that in case
> of failures the previously processed dataset should not get processed
> again.
>
> Is there a way to checkpoint in Spark batch
> 1. checkpoint processed partitions so that if there are 1000 partitions
> and 100 were processed in the previous batch, they should not get processed
> again.
> 2. checkpoint partial partition in foreachPartition, if I have processed
> 100 records from a partition which have 1000 total records, is there a way
> to checkpoint offset so that those 100 should not get processed again.
>
> Regards,
> Abhishek Singla
>

Re: Checkpointing in foreachPartition in Spark batck

Reply via email to