Couple of ways. 1. Easy but approx way: Find scheduling delay and processing time using StreamingListener interface, and then calculate "end-to-end delay = 0.5 * batch interval + scheduling delay + processing time". The 0.5 * batch inteval is the approx average batching delay across all the records in the batch.
2. Hard but precise way: You could build a custom receiver that embeds the current timestamp in the records, and then compare them with the timestamp at the final step of the records. Assuming the executor and driver clocks are reasonably in sync, this will measure the latency between the time is received by the system and the result from the record is available. On Thu, Jun 18, 2015 at 2:12 PM, anshu shukla <anshushuk...@gmail.com> wrote: > Sorry , i missed the LATENCY word.. for a large streaming query .How to > find the time taken by the particular RDD to travel from initial > D-STREAM to final/last D-STREAM . > Help Please !! > > On Fri, Jun 19, 2015 at 12:40 AM, Tathagata Das <t...@databricks.com> > wrote: > >> Its not clear what you are asking. Find "what" among RDD? >> >> On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla <anshushuk...@gmail.com> >> wrote: >> >>> Is there any fixed way to find among RDD in stream processing systems >>> , in the Distributed set-up . >>> >>> -- >>> Thanks & Regards, >>> Anshu Shukla >>> >> >> > > > -- > Thanks & Regards, > Anshu Shukla >