Yes but your SSS job has to be stopped gracefully. Originally I raised this SPIP request
https://issues.apache.org/jira/browse/SPARK-42485 Then I requested "Adding pause() method to pyspark.sql.streaming.StreamingQuery" I believe they are still open. HTH Mich Talebzadeh, Architect | Data Science | Financial Crime | GDPR & Compliance Specialist PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Tue, 3 Dec 2024 at 18:33, "Yuri Oleynikov (יורי אולייניקוב)" < yur...@gmail.com> wrote: > Hi Yegor > If your are not using Delta format (eg avro/json/parquet/csv/etc) then you > have two options > #1 cleanup WAL files (afaik it’s named _metadata folder insider your data > folder) which requires that SSS job has to be stopped before you are > cleaning the WAL. > #2 you can use foreachBatch for write your data but then your SSS will not > be exactly once but at least once > > Best regards > > On 3 Dec 2024, at 17:07, Дубинкин Егор <dubinkine...@gmail.com> wrote: > > > Hello Community, > > I need to delete old src data created by Spark Structured Streaming. > Just deleting relevant folder throws an exception while reading batch > dataframe from file-system: > > java.io.FileNotFoundException: File > file:/data/avro/year=2020/month=3/day=13/hour=12/part-00000-0cc84e65-3f49-4686-85e3-1ecf48952794.c000.avro > does not exist > > Issue is actualy the same that described here: > > https://stackoverflow.com/questions/60773445/how-to-delete-old-data-that-was-created-by-spark-structured-streaming?newreg=5cc791c48358491c88d9b2dae1e436d9 > > Didn't find a way to delete it via Spark API. > Are there any solutions to do it via API instead of editing metadata > manually? > > Your help would be appreciated. > >