unsubscribe

2024-12-03 Thread Phil Stavridis
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark Structured Streaming] How to delete old data that was created by Spark Structured Streaming?

2024-12-03 Thread Andrei L
Hello! We are facing the same issue. Could you please elaborate on how to clean up WALs/delete irrelevant partitions when SSS job is stopped? Direct usage of FileStreamSinkLog in the mentioned StackOverflow question looks "hacky". Moreover workaround provided in SO question uses FileStreamSinkLog

Re: [Spark Structured Streaming] How to delete old data that was created by Spark Structured Streaming?

2024-12-03 Thread Mich Talebzadeh
Yes but your SSS job has to be stopped gracefully. Originally I raised this SPIP request https://issues.apache.org/jira/browse/SPARK-42485 Then I requested "Adding pause() method to pyspark.sql.streaming.StreamingQuery" I believe they are still open. HTH Mich Talebzadeh, Architect | Data Scie

Re: [Spark Structured Streaming] How to delete old data that was created by Spark Structured Streaming?

2024-12-03 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
Hi Yegor If your are not using Delta format (eg avro/json/parquet/csv/etc) then you have two options #1 cleanup WAL files (afaik it’s named _metadata folder insider your data folder) which requires that SSS job has to be stopped before you are cleaning the WAL. #2 you can use foreachBatch for wr

RE: [Spark Structured Streaming] How to delete old data that was created by Spark Structured Streaming?

2024-12-03 Thread Дубинкин Егор
Forgot to mention: Spark 3.5.2 is used On 2024/12/03 15:05:18 Дубинкин Егор wrote: > Hello Community, > > I need to delete old src data created by Spark Structured Streaming. > Just deleting relevant folder throws an exception while reading batch > dataframe from file-system: > > java.io.FileNotF

[Spark Structured Streaming] How to delete old data that was created by Spark Structured Streaming?

2024-12-03 Thread Дубинкин Егор
Hello Community, I need to delete old src data created by Spark Structured Streaming. Just deleting relevant folder throws an exception while reading batch dataframe from file-system: java.io.FileNotFoundException: File file:/data/avro/year=2020/month=3/day=13/hour=12/part-0-0cc84e65-3f49-4

[ANNOUNCE] Apache Sedona 1.7.0 released

2024-12-03 Thread Jia Yu
Dear all, We are happy to report that we have released Apache Sedona 1.7.0. Thank you again for your help. Apache Sedona is a cluster computing system for processing large-scale spatial data. Vote thread (Permalink from https://lists.apache.org/list.html): https://lists.apache.org/thread/5hvcr80