How will i can to know that for how much time particular RDD had remained in pipeline .
On Fri, Jun 19, 2015 at 7:59 AM, Tathagata Das <t...@databricks.com> wrote: > Why do you need to uniquely identify the message? All you need is the time > when the message was inserted by the receiver, and when it is processed, > isnt it? > > > On Thu, Jun 18, 2015 at 2:28 PM, anshu shukla <anshushuk...@gmail.com> > wrote: > >> Thanks alot , But i have already tried the second way ,Problem with >> that is that how to identify the particular RDD from source to sink (as we >> can do by passing a msg id in storm) . For that i just updated RDD and >> added a msgID (as static variable) . but while dumping them to file some of >> the tuples of RDD are failed/missed (approx 3000 and data rate is aprox >> 1500 tuples/sec). >> >> On Fri, Jun 19, 2015 at 2:50 AM, Tathagata Das <t...@databricks.com> >> wrote: >> >>> Couple of ways. >>> >>> 1. Easy but approx way: Find scheduling delay and processing time using >>> StreamingListener interface, and then calculate "end-to-end delay = 0.5 * >>> batch interval + scheduling delay + processing time". The 0.5 * batch >>> inteval is the approx average batching delay across all the records in the >>> batch. >>> >>> 2. Hard but precise way: You could build a custom receiver that embeds >>> the current timestamp in the records, and then compare them with the >>> timestamp at the final step of the records. Assuming the executor and >>> driver clocks are reasonably in sync, this will measure the latency between >>> the time is received by the system and the result from the record is >>> available. >>> >>> On Thu, Jun 18, 2015 at 2:12 PM, anshu shukla <anshushuk...@gmail.com> >>> wrote: >>> >>>> Sorry , i missed the LATENCY word.. for a large streaming query .How >>>> to find the time taken by the particular RDD to travel from initial >>>> D-STREAM to final/last D-STREAM . >>>> Help Please !! >>>> >>>> On Fri, Jun 19, 2015 at 12:40 AM, Tathagata Das <t...@databricks.com> >>>> wrote: >>>> >>>>> Its not clear what you are asking. Find "what" among RDD? >>>>> >>>>> On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla <anshushuk...@gmail.com >>>>> > wrote: >>>>> >>>>>> Is there any fixed way to find among RDD in stream processing >>>>>> systems , in the Distributed set-up . >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> Anshu Shukla >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> Anshu Shukla >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anshu Shukla >> > > -- Thanks & Regards, Anshu Shukla