Hi Yegor
If your are not using Delta format (eg avro/json/parquet/csv/etc) then you have 
two options
#1 cleanup WAL files (afaik it’s named _metadata folder insider your data 
folder) which requires that SSS job has to be stopped before you are cleaning 
the WAL.
#2 you can use foreachBatch for write your data but then your SSS will not be 
exactly once but at least once

Best regards

> On 3 Dec 2024, at 17:07, Дубинкин Егор <dubinkine...@gmail.com> wrote:
> 
> 
> Hello Community,
> 
> I need to delete old src data created by Spark Structured Streaming.
> Just deleting relevant folder throws an exception while reading batch 
> dataframe from file-system:
> java.io.FileNotFoundException: File 
> file:/data/avro/year=2020/month=3/day=13/hour=12/part-00000-0cc84e65-3f49-4686-85e3-1ecf48952794.c000.avro
>  does not exist
> Issue is actualy the same that described here:
> https://stackoverflow.com/questions/60773445/how-to-delete-old-data-that-was-created-by-spark-structured-streaming?newreg=5cc791c48358491c88d9b2dae1e436d9
> 
> Didn't find a way to delete it via Spark API.
> Are there any solutions to do it via API instead of editing metadata manually?
> 
> Your help would be appreciated.

Reply via email to