Dear Spark Team, I am working with a standalone cluster, and I am using Spark Connect to submit my applications. My current version is 3.5.1.
I am trying to run Structured Streaming Queries with relatively long trigger intervals (2 hours, 1 day). The first issue I encountered was “Streaming query has been idle and waiting for new data more than 10000ms”. I solved it by increasing the value in the internal config property ‘spark.sql.streaming.noDataProgressEventInterval’. Now my query is not considered idle anymore but Connect expires the session after ~1 hour, and the query is killed with it. I believe, I have studied everything I could find online, but I could not find the answers. I would really appreciate if you provided some 😊 Is it not intended for Spark Connect to support “detached” Streaming Queries? Would you consider detaching StreamingQueries from the sessions that start them, as they are meant to run continuously? Would you consider extending control options in Spark Connect UI (start, stop, reset checkpoints)? It will help the users like me, who want to use Spark’s Structured Streaming and Connect without running additional applications just to keep the session alive. I will be happy to answer any question from your side or provide more details. Best regards, Anastasiia