Hello!

This is my first time ever utilizing a mailing list, so I apologize if I’m not 
conforming to any standards or rules (and please correct me where obvious). I’m 
looking to inquire about Spark’s StreamingQueryListener.

I currently have a Spark Streaming job with a trigger interval of 10 minutes in 
a cluster. I want to periodically execute maintenance jobs (OPTIMIZE, DELETE, 
VACUUM) in the same cluster to save on compute resources. Ideally, I don’t want 
all of these jobs running concurrently or when the Spark Streaming job is 
processing data. I want to implement a `StreamingQueryListener` to detect when 
any streaming queries are running and delay the execution of the maintenance 
jobs. From testing, I see that `onQueryIdle` does not trigger when a query is 
waiting for the next trigger interval. Before diving into the Apache Spark 
code, I wanted to get thoughts on whether it’s worth implementing a new 
QueryListener method (something like `onQueryWait`) that will report when a 
streaming query is awaiting a new trigger.

Thoughts? Is this too naive?
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to