from:"Andrzej Zera"

StreamingQueryListener in PySpark lagging behind

2024-10-31 Thread Andrzej Zera

Hello, I have an issue with StreamingQueryListener in my Structured Streaming application written in PySpark. I'm running around 8 queries, and each query runs every 5-20 seconds. In total, I have around ~40 microbatch execution per minute. I set up Python's StreamingQueryListener to collect metri

Re: Bugs with joins and SQL in Structured Streaming

2024-10-01 Thread Andrzej Zera

; watermark to window.end + 5 mins does not produce the output and fails the >>> test. >>> >>> Please let me know if this does not make sense to you and we can discuss >>> more. >>> >>> I haven't had time to look into SqlSyntaxTest - we don

Re: OOM issue in Spark Driver

2024-06-08 Thread Andrzej Zera

Hey, do you perform stateful operations? Maybe your state is growing indefinitely - a screenshot with state metrics would help (you can find it in Spark UI -> Structured Streaming -> your query). Do you have a driver-only cluster or do you have workers too? What's the memory usage profile at worker

Re: Bugs with joins and SQL in Structured Streaming

2024-03-11 Thread Andrzej Zera

Hi, Do you think there is any chance for this issue to get resolved? Should I create another bug report? As mentioned in my message, there is one open already: https://issues.apache.org/jira/browse/SPARK-45637 but it covers only one of the problems. Andrzej wt., 27 lut 2024 o 09:58 Andrzej Zera

Re: Bugs with joins and SQL in Structured Streaming

2024-02-27 Thread Andrzej Zera

anteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On M

Bugs with joins and SQL in Structured Streaming

2024-02-26 Thread Andrzej Zera

Hey all, I've been using Structured Streaming in production for almost a year already and I want to share the bugs I found in this time. I created a test for each of the issues and put them all here: https://github.com/andrzejzera/spark-bugs/tree/main/spark-3.5/src/test/scala I split the issues i

Re: [Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-24 Thread Andrzej Zera

is helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Thu, Jan 11, 2024 at 6:13 AM Andrzej Zera > wrote: > >> I'm struggling with the following issue in Spark >=3.4, related to >> multiple stateful operations. >> >> When

Re: [Structured Streaming] Keeping checkpointing cost under control

2024-01-10 Thread Andrzej Zera

>>> intermediate_df = streaming_data.groupBy(...).count() >>> intermediate_df.cache() >>> # Use cached intermediate_df for further transformations or actions >>> >>> HTH >>> >>> Mich Talebzadeh, >>> Dad | Technologist | Solut

[Structured Streaming] Avoid one microbatch delay with multiple stateful operations

2024-01-10 Thread Andrzej Zera

I'm struggling with the following issue in Spark >=3.4, related to multiple stateful operations. When spark.sql.streaming.statefulOperator.allowMultiple is enabled, Spark keeps track of two types of watermarks: eventTimeWatermarkForEviction and eventTimeWatermarkForLateEvents. Introducing them all

Re: [Structured Streaming] Keeping checkpointing cost under control

2024-01-10 Thread Andrzej Zera

05b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content i

Re: [Structured Streaming] Keeping checkpointing cost under control

2024-01-10 Thread Andrzej Zera

om/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on th

Re: [Structured Streaming] Keeping checkpointing cost under control

2024-01-07 Thread Andrzej Zera

onetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 6 Jan 2024 at 08:19, Andrzej Zera wrote: > >> Hey, >> >> I'm running a few Structured Streaming jobs (with Spark 3.5.0) that >> require near-real time accurac

[Structured Streaming] Keeping checkpointing cost under control

2024-01-05 Thread Andrzej Zera

Hey, I'm running a few Structured Streaming jobs (with Spark 3.5.0) that require near-real time accuracy with trigger intervals in the level of 5-10 seconds. I usually run 3-6 streaming queries as part of the job and each query includes at least one stateful operation (and usually two or more). My

Re: [Structured Streaming] Joins after aggregation don't work in streaming

2023-10-27 Thread Andrzej Zera

ng! > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Fri, Oct 27, 2023 at 5:22 AM Andrzej Zera > wrote: > >> Hey All, >> >> I'm trying to reproduce the following streaming operation: "Time window >> aggregation in separate streams followed by stream-st

[Structured Streaming] Joins after aggregation don't work in streaming

2023-10-26 Thread Andrzej Zera

Hey All, I'm trying to reproduce the following streaming operation: "Time window aggregation in separate streams followed by stream-stream join". According to documentation, this should be possible in Spark 3.5.0 but I had no success despite different tries. Here is a documentation snippet I'm tr

StreamingQueryListener in PySpark lagging behind

Re: Bugs with joins and SQL in Structured Streaming

Re: OOM issue in Spark Driver

Re: Bugs with joins and SQL in Structured Streaming

Re: Bugs with joins and SQL in Structured Streaming

Bugs with joins and SQL in Structured Streaming

Re: [Structured Streaming] Avoid one microbatch delay with multiple stateful operations

Re: [Structured Streaming] Keeping checkpointing cost under control

[Structured Streaming] Avoid one microbatch delay with multiple stateful operations

Re: [Structured Streaming] Keeping checkpointing cost under control

Re: [Structured Streaming] Keeping checkpointing cost under control

Re: [Structured Streaming] Keeping checkpointing cost under control

[Structured Streaming] Keeping checkpointing cost under control

Re: [Structured Streaming] Joins after aggregation don't work in streaming

[Structured Streaming] Joins after aggregation don't work in streaming

15 matches

Site Navigation

Mail list logo

Footer information