Ok thanks Michael for all your help! -------------------------------------------------- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me
> On Apr 26, 2018, at 19:24, TechnoMage <mla...@technomage.com> wrote: > > Yes, Kafka for source and sink which makes monitoring the Flink in/out easy. > > Michael > >> On Apr 26, 2018, at 5:27 PM, Dhruv Kumar <gargdhru...@gmail.com >> <mailto:gargdhru...@gmail.com>> wrote: >> >> Ok that answers my questions. >> >> What are you keeping the source and sink as? Is it Kafka for both? >> >> -------------------------------------------------- >> Dhruv Kumar >> PhD Candidate >> Department of Computer Science and Engineering >> University of Minnesota >> www.dhruvkumar.me <http://www.dhruvkumar.me/> >> >>> On Apr 26, 2018, at 16:37, TechnoMage <mla...@technomage.com >>> <mailto:mla...@technomage.com>> wrote: >>> >>> Yes NTP can still have skew. It may be measured in fractions of a second, >>> but with Flink that can be significant if you care about sub-second latency >>> accuracy. Since I have a 20 stage stream with 0.002 second latency it can >>> matter. >>> >>> Back pressure is the limiting of input due to the inability of down-stream >>> tasks to accept input. For example if you have a map that reads from a >>> database to enhance an element, that may limit earlier steps performance as >>> they can not push elements to it faster than it can read from the database. >>> This can flow all the way back to the source and slow records coming into >>> the system. >>> >>> Michael >>> >>>> On Apr 26, 2018, at 12:38 PM, Dhruv Kumar <gargdhru...@gmail.com >>>> <mailto:gargdhru...@gmail.com>> wrote: >>>> >>>> What do you mean by the time skew from one machine(source) to >>>> another(sink)? Do you mean the system time clocks of the source and sink >>>> may not be in sync. If I regularly use NTP to keep the system clocks in >>>> sync, will time skew still happen? >>>> >>>> Could you also elaborate on what do you mean by back pressure on source >>>> and how will it impact the latency calculations? >>>> >>>> Sorry if these are trivial questions. I am a bit new to the real world >>>> streaming systems. >>>> >>>> -------------------------------------------------- >>>> Dhruv Kumar >>>> PhD Candidate >>>> Department of Computer Science and Engineering >>>> University of Minnesota >>>> www.dhruvkumar.me <http://www.dhruvkumar.me/> >>>> >>>>> On Apr 26, 2018, at 13:26, TechnoMage <mla...@technomage.com >>>>> <mailto:mla...@technomage.com>> wrote: >>>>> >>>>> In a single machine system this may work ok. In a multi-machine system >>>>> this is not as reliable as the time skew from one machine (source) to >>>>> another (sink) can impact the measurements. This also does not account >>>>> for back presure on the source. We are using an external process to in >>>>> parallel read the source and output of the sink to measure the latency on >>>>> a single system clock. It does account for those issues, but of course >>>>> does not account for delivery delays in the messaging system (kafka in >>>>> our case). But, does measure real world latency as seen by the rest of >>>>> the system which is ultimately what matters to us. >>>>> >>>>> Michael >>>>> >>>>>> On Apr 26, 2018, at 12:01 PM, Dhruv Kumar <gargdhru...@gmail.com >>>>>> <mailto:gargdhru...@gmail.com>> wrote: >>>>>> >>>>>> Hi >>>>>> >>>>>> I was trying to compute the end-to-end-latency for each record processed >>>>>> by Flink. By end-to-end latency, I mean the difference between the time >>>>>> at which the record entered the Flink system (came at source) and the >>>>>> time at which the record is finally emitted into the sink. What is the >>>>>> best way to measure this? I was thinking of doing the following: >>>>>> 1. Add the current system timestamp to the record when the record >>>>>> arrives at Flink. >>>>>> 2. Add the current system timestamp to the record when the record is >>>>>> finally being emitted into the sink. >>>>>> 3. Take the difference between 2 and 1 offline when all the records have >>>>>> been written into the sink. >>>>>> >>>>>> Does this sound ok? >>>>>> >>>>>> Also, if I use Processing time characteristic for this >>>>>> end-to-end-latency, will it be fine? >>>>>> >>>>>> Thanks >>>>>> -------------------------------------------------- >>>>>> Dhruv Kumar >>>>>> PhD Candidate >>>>>> Department of Computer Science and Engineering >>>>>> University of Minnesota >>>>>> www.dhruvkumar.me <http://www.dhruvkumar.me/> >>>>> >>>> >>> >> >