Hi Jamie! Did you mean to attach a screenshot? If yes, you need to share that through a different channel, the mailing list does not support attachments, unfortunately.
Seth is right how the time is measured. One important bit to add to the interpretation: - For non-source tasks, the time include the "travel of the barriers", which can take long under back pressure - For source tasks, it includes the "time to acquire the checkpoint lock", which can be long if the source is blocked in trying to emit data (again, backpressure). As part of FLIP-27 we will eliminate the checkpoint lock (have a mailbox instead) which should lead to faster lock acquisition. The "unaligned checkpoints" discussion is looking at ways to make checkpoints much less susceptible to back pressure. https://lists.apache.org/thread.html/fd5b6cceb4bffb635e26e7ec0787a8db454ddd64aadb40a0d08a90a8@%3Cdev.flink.apache.org%3E Hope that helps understanding what is going on. Best, Stephan On Thu, Sep 12, 2019 at 1:25 AM Seth Wiesman <sjwies...@gmail.com> wrote: > Great timing, I just debugged this on Monday. E2e time is checkpoint > coordinator to checkpoint coordinator, so it includes RPC to the source and > RPC from the operator back for the JM. > > Seth > > > On Sep 11, 2019, at 6:17 PM, Jamie Grier <jgr...@lyft.com.invalid> > wrote: > > > > Hey all, > > > > I need to make sense of this behavior. Any help would be appreciated. > > > > Here’s an example of a set of Flink checkpoint metrics I don’t > understand. This is the first operator in a job and as you can see the > end-to-end time for the checkpoint is long, but it’s not explained by > either sync, async, or alignment times. I’m not sure what to make of > this. It makes me think I don’t understand the meaning of the metrics > themselves. In my interpretation the end-to-end time should always be, > roughly, the sum of the other components — certainly in the case of a > source task such as this. > > > > Any thoughts or clarifications anyone can provide on this? We have many > jobs with slow checkpoints that suffer from this sort of thing with metrics > that look similar. > > > > -Jamie > > >