Re: Checkpoint metrics.

Seth Wiesman Wed, 11 Sep 2019 16:26:13 -0700

Great timing, I just debugged this on Monday. E2e time is checkpoint 
coordinator to checkpoint coordinator, so it includes RPC to the source and RPC 
from the operator back for the JM.


Seth 

> On Sep 11, 2019, at 6:17 PM, Jamie Grier <jgr...@lyft.com.invalid> wrote:
> 
> Hey all,
> 
> I need to make sense of this behavior.  Any help would be appreciated.
> 
> Here’s an example of a set of Flink checkpoint metrics I don’t understand.  
> This is the first operator in a job and as you can see the end-to-end time 
> for the checkpoint is long, but it’s not explained by either sync, async, or 
> alignment times.  I’m not sure what to make of this.  It makes me think I 
> don’t understand the meaning of the metrics themselves.  In my interpretation 
> the end-to-end time should always be, roughly, the sum of the other 
> components — certainly in the case of a source task such as this.
> 
> Any thoughts or clarifications anyone can provide on this?  We have many jobs 
> with slow checkpoints that suffer from this sort of thing with metrics that 
> look similar.
> 
> -Jamie
>

Re: Checkpoint metrics.

Reply via email to