The SQL plan of each micro-batch in the Spark UI (SQL tab) has links to the
actual Spark jobs that ran in the micro-batch. From that you can drill down
into the stage information. I agree that its not there as a nice per-stream
table as with the Streaming tab, but all the information is present if
Thanks TD, but the sql plan does not seem to provide any information on
which stage is taking longer time or to identify any bottlenecks about
various stages. Spark kafka Direct used to provide information about
various stages in a micro batch and the time taken by each stage. Is there
a way to fin
Also, you can get information about the last progress made (input rates,
etc.) from StreamingQuery.lastProgress, StreamingQuery.recentProgress, and
using StreamingQueryListener.
Its all documented -
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streamin
Structured Streaming does not maintain a queue of batch like DStream.
DStreams used to cut off batches at a fixed interval and put in a queue,
and a different thread processed queued batches. In contrast, Structured
Streaming simply cuts off and immediately processes a batch after the
previous batc
hi,
How do we get information like lag and queued up batches in Structured
streaming? Following api does not seem to give any info about lag and
queued up batches similar to DStreams.
https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/streaming/scheduler/BatchInfo.html
Thanks!
--