Structured Streaming and Spark Connect

Anastasiia Sokhova Mon, 23 Sep 2024 09:41:26 -0700

Dear Spark Team,

I am working with a standalone cluster, and I am using Spark Connect to submit 
my applications.
My current version is 3.5.1.


I am trying to run Structured Streaming Queries with relatively long trigger 
intervals (2 hours, 1 day).

The first issue I encountered was “Streaming query has been idle and waiting 
for new data more than 10000ms”. I solved it by increasing the value in the 
internal config property  ‘spark.sql.streaming.noDataProgressEventInterval’.
Now my query is not considered idle anymore but Connect expires the session 
after ~1 hour, and the query is killed with it.

I believe, I have studied everything I could find online, but I could not find 
the answers.
I would really appreciate if you provided some 😊

Is it not intended for Spark Connect to support “detached” Streaming Queries?
Would you consider detaching StreamingQueries from the sessions that start 
them, as they are meant to run continuously?
Would you consider extending control options in Spark Connect UI (start, stop, 
reset checkpoints)?
It will help the users like me, who want to use Spark’s Structured Streaming 
and Connect without running additional applications just to keep the session 
alive.

I will be happy to answer any question from your side or provide more details.

Best regards,
Anastasiia

Structured Streaming and Spark Connect

Reply via email to