What about returning the HTTP codes as a dataframe result in the foreachPartition, saving it to files/table/whatever and then performing a join to discard already ok processed rows when you it try again?
El mié, 16 abr 2025 a las 15:01, Abhishek Singla (< abhisheksingla...@gmail.com>) escribió: > Hi Team, > > We are using foreachPartition to send dataset row data to third system via > HTTP client. The operation is not idempotent. I wanna ensure that in case > of failures the previously processed dataset should not get processed > again. > > Is there a way to checkpoint in Spark batch > 1. checkpoint processed partitions so that if there are 1000 partitions > and 100 were processed in the previous batch, they should not get processed > again. > 2. checkpoint partial partition in foreachPartition, if I have processed > 100 records from a partition which have 1000 total records, is there a way > to checkpoint offset so that those 100 should not get processed again. > > Regards, > Abhishek Singla >