Hello,
I have an issue with StreamingQueryListener in my Structured Streaming
application written in PySpark. I'm running around 8 queries, and each
query runs every 5-20 seconds. In total, I have around ~40 microbatch
execution per minute. I set up Python's StreamingQueryListener to collect
metri
; watermark to window.end + 5 mins does not produce the output and fails the
>>> test.
>>>
>>> Please let me know if this does not make sense to you and we can discuss
>>> more.
>>>
>>> I haven't had time to look into SqlSyntaxTest - we don
Hey, do you perform stateful operations? Maybe your state is growing
indefinitely - a screenshot with state metrics would help (you can find it
in Spark UI -> Structured Streaming -> your query). Do you have a
driver-only cluster or do you have workers too? What's the memory usage
profile at worker
Hi,
Do you think there is any chance for this issue to get resolved? Should I
create another bug report? As mentioned in my message, there is one open
already: https://issues.apache.org/jira/browse/SPARK-45637 but it covers
only one of the problems.
Andrzej
wt., 27 lut 2024 o 09:58 Andrzej Zera
anteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On M
Hey all,
I've been using Structured Streaming in production for almost a year
already and I want to share the bugs I found in this time. I created a test
for each of the issues and put them all here:
https://github.com/andrzejzera/spark-bugs/tree/main/spark-3.5/src/test/scala
I split the issues i
is helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Thu, Jan 11, 2024 at 6:13 AM Andrzej Zera
> wrote:
>
>> I'm struggling with the following issue in Spark >=3.4, related to
>> multiple stateful operations.
>>
>> When
>>> intermediate_df = streaming_data.groupBy(...).count()
>>> intermediate_df.cache()
>>> # Use cached intermediate_df for further transformations or actions
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solut
I'm struggling with the following issue in Spark >=3.4, related to multiple
stateful operations.
When spark.sql.streaming.statefulOperator.allowMultiple is enabled, Spark
keeps track of two types of watermarks: eventTimeWatermarkForEviction and
eventTimeWatermarkForLateEvents. Introducing them all
05b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content i
om/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on th
onetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 6 Jan 2024 at 08:19, Andrzej Zera wrote:
>
>> Hey,
>>
>> I'm running a few Structured Streaming jobs (with Spark 3.5.0) that
>> require near-real time accurac
Hey,
I'm running a few Structured Streaming jobs (with Spark 3.5.0) that require
near-real time accuracy with trigger intervals in the level of 5-10
seconds. I usually run 3-6 streaming queries as part of the job and each
query includes at least one stateful operation (and usually two or more).
My
ng!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Fri, Oct 27, 2023 at 5:22 AM Andrzej Zera
> wrote:
>
>> Hey All,
>>
>> I'm trying to reproduce the following streaming operation: "Time window
>> aggregation in separate streams followed by stream-st
Hey All,
I'm trying to reproduce the following streaming operation: "Time window
aggregation in separate streams followed by stream-stream join". According
to documentation, this should be possible in Spark 3.5.0 but I had no
success despite different tries.
Here is a documentation snippet I'm tr
15 matches
Mail list logo