I think what's weird is that non of the three stages: alignment, sync cp, async cp takes much time.
On Tue, Sep 18, 2018 at 3:20 PM Till Rohrmann <trohrm...@apache.org> wrote: > This behavior seems very odd Julio. Could you indeed share the debug logs > of all Flink processes in order to see why things are taking so long? > > The checkpoint size of task #8 is twice as big as the second biggest > checkpoint. But this should not cause an increase in checkpoint time of a > factor of 8. > > Cheers, > Till > > On Mon, Sep 17, 2018 at 5:25 AM Renjie Liu <liurenjie2...@gmail.com> > wrote: > >> Hi, Julio: >> This happens frequently? What state backend do you use? The async >> checkpoint duration and sync checkpoint duration seems normal compared to >> others, it seems that most of the time are spent acking the checkpoint. >> >> On Sun, Sep 16, 2018 at 9:24 AM vino yang <yanghua1...@gmail.com> wrote: >> >>> Hi Julio, >>> >>> Yes, it seems that fifty-five minutes is really long. >>> However, it is linear with the time and size of the previous task >>> adjacent to it in the diagram. >>> I think your real application is concerned about why Flink accesses HDFS >>> so slowly. >>> You can call the DEBUG log to see if you can find any clues, or post the >>> log to the mailing list to help others analyze the problem for you. >>> >>> Thanks, vino. >>> >>> Julio Biason <julio.bia...@azion.com> 于2018年9月15日周六 上午7:03写道: >>> >>>> (Just an addendum: Although it's not a huge problem -- we can always >>>> increase the checkpoint timeout time -- this anomalous situation makes me >>>> think there is something wrong in our pipeline or in our cluster, and that >>>> is what is making the checkpoint creation go crazy.) >>>> >>>> On Fri, Sep 14, 2018 at 8:00 PM, Julio Biason <julio.bia...@azion.com> >>>> wrote: >>>> >>>>> Hey guys, >>>>> >>>>> On our pipeline, we have a single slot that it's taking longer to >>>>> create the checkpoint compared to other slots and we are wondering what >>>>> could be causing it. >>>>> >>>>> The operator in question is the window metric -- the only element in >>>>> the pipeline that actually uses the state. While the other slots take 7 >>>>> mins to create the checkpoint, this one -- and only this one -- takes >>>>> 55mins. >>>>> >>>>> Is there something I should look at to understand what's going on? >>>>> >>>>> (We are storing all checkpoints in HDFS, in case that helps.) >>>>> >>>>> -- >>>>> *Julio Biason*, Sofware Engineer >>>>> *AZION* | Deliver. Accelerate. Protect. >>>>> Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51 >>>>> <callto:+5551996209291>*99907 0554* >>>>> >>>> >>>> >>>> >>>> -- >>>> *Julio Biason*, Sofware Engineer >>>> *AZION* | Deliver. Accelerate. Protect. >>>> Office: +55 51 3083 8101 <callto:+555130838101> | Mobile: +55 51 >>>> <callto:+5551996209291>*99907 0554* >>>> >>> -- >> Liu, Renjie >> Software Engineer, MVAD >> > -- Liu, Renjie Software Engineer, MVAD