Use a streaming query listener that tracks repetitive progress events for the 
same batch id. If x amount of time has elapsed given repetitive progress events 
for the same batch id, the source is not providing new offsets and stream 
execution is not scheduling new micro batches. See also: 
spark.sql.streaming.pollingDelay. Alternative methods may produce less than 
desirable results due to specific characteristics of a source / sink / 
workflow. It may be more desirable to represent the amount of time as the 
number of repetitive progress events to be more forgiving of implementation 
details (e.g., kafka source has internal retry attempts to determine latest 
offsets and sleeps in between attempts if there is a miss when asked for new 
data, etc.).


-Chris

________________________________
From: Aakash Basu <aakash.spark....@gmail.com>
Sent: Thursday, March 22, 2018 10:45:38 PM
To: user
Subject: Structured Streaming Spark 2.3 Query

Hi,

What is the way to stop a Spark Streaming job if there is no data inflow for an 
arbitrary amount of time (eg: 2 mins)?

Thanks,
Aakash.

Reply via email to