Yes, Kafka for source and sink which makes monitoring the Flink in/out easy.


> On Apr 26, 2018, at 5:27 PM, Dhruv Kumar <> wrote:
> Ok that answers my questions.
> What are you keeping the source and sink as? Is it Kafka for both?
> --------------------------------------------------
> Dhruv Kumar
> PhD Candidate
> Department of Computer Science and Engineering
> University of Minnesota
> <>
>> On Apr 26, 2018, at 16:37, TechnoMage < 
>> <>> wrote:
>> Yes NTP can still have skew.  It may be measured in fractions of a second, 
>> but with Flink that can be significant if you care about sub-second latency 
>> accuracy.  Since I have a 20 stage stream with 0.002 second latency it can 
>> matter.
>> Back pressure is the limiting of input due to the inability of down-stream 
>> tasks to accept input.  For example if you have a map that reads from a 
>> database to enhance an element, that may limit earlier steps performance as 
>> they can not push elements to it faster than it can read from the database.  
>> This can flow all the way back to the source and slow records coming into 
>> the system.
>> Michael
>>> On Apr 26, 2018, at 12:38 PM, Dhruv Kumar < 
>>> <>> wrote:
>>> What do you mean by the time skew from one machine(source) to 
>>> another(sink)? Do you mean the system time clocks of the source and sink 
>>> may not be in sync. If I regularly use NTP to keep the system clocks in 
>>> sync, will time skew still happen?
>>> Could you also elaborate on what do you mean by back pressure on source and 
>>> how will it impact the latency calculations?
>>> Sorry if these are trivial questions. I am a bit new to the real world 
>>> streaming systems.
>>> --------------------------------------------------
>>> Dhruv Kumar
>>> PhD Candidate
>>> Department of Computer Science and Engineering
>>> University of Minnesota
>>> <>
>>>> On Apr 26, 2018, at 13:26, TechnoMage < 
>>>> <>> wrote:
>>>> In a single machine system this may work ok.  In a multi-machine system 
>>>> this is not as reliable as the time skew from one machine (source) to 
>>>> another (sink) can impact the measurements.  This also does not account 
>>>> for back presure on the source.  We are using an external process to in 
>>>> parallel read the source and output of the sink to measure the latency on 
>>>> a single system clock.  It does account for those issues, but of course 
>>>> does not account for delivery delays in the messaging system (kafka in our 
>>>> case).  But, does measure real world latency as seen by the rest of the 
>>>> system which is ultimately what matters to us.
>>>> Michael
>>>>> On Apr 26, 2018, at 12:01 PM, Dhruv Kumar < 
>>>>> <>> wrote:
>>>>> Hi
>>>>> I was trying to compute the end-to-end-latency for each record processed 
>>>>> by Flink. By end-to-end latency, I mean the difference between the time 
>>>>> at which the record entered the Flink system (came at source) and the 
>>>>> time at which the record is finally emitted into the sink. What is the 
>>>>> best way to measure this? I was thinking of doing the following:
>>>>> 1. Add the current system timestamp to the record when the record arrives 
>>>>> at Flink.
>>>>> 2. Add the current system timestamp to the record when the record is 
>>>>> finally being emitted into the sink.
>>>>> 3. Take the difference between 2 and 1 offline when all the records have 
>>>>> been written into the sink.
>>>>> Does this sound ok?
>>>>> Also, if I use Processing time characteristic for this 
>>>>> end-to-end-latency, will it be fine?
>>>>> Thanks
>>>>> --------------------------------------------------
>>>>> Dhruv Kumar
>>>>> PhD Candidate
>>>>> Department of Computer Science and Engineering
>>>>> University of Minnesota
>>>>> <>

Reply via email to