Hi Gyula, I don't know the cause unfortunately, but we observed a similiar issue on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1. Which version are you running on?
Urs On 12.07.2017 09:48, Gyula Fóra wrote: > Hi, > > I have noticed a strange behavior in one of our jobs: every once in a while > the Kafka source checkpointing time becomes extremely large compared to > what it usually is. (To be very specific it is a kafka source chained with > a stateless map operator) > > To be more specific checkpointing the offsets usually takes around 10ms > which sounds reasonable but in some checkpoints this goes into the 3-5 > minutes range practically blocking the job for that period of time. > Yesterday I have observed even 10 minute delays. First I thought that some > sources might trigger checkpoints later than others, but adding some > logging and comparing it it seems that the triggerCheckpoint was received > at the same time. > > Interestingly only one of the 3 kafka sources in the job seems to be > affected (last time I checked at least). We are still using the 0.8 > consumer with commit on checkpoints. Also I dont see this happen in other > jobs. > > Any clue on what might cause this? > > Thanks :) > Gyula > > > > Hi, > > I have noticed a strange behavior in one of our jobs: every once in a > while the Kafka source checkpointing time becomes extremely large > compared to what it usually is. (To be very specific it is a kafka > source chained with a stateless map operator) > > To be more specific checkpointing the offsets usually takes around 10ms > which sounds reasonable but in some checkpoints this goes into the 3-5 > minutes range practically blocking the job for that period of time. > Yesterday I have observed even 10 minute delays. First I thought that > some sources might trigger checkpoints later than others, but adding > some logging and comparing it it seems that the triggerCheckpoint was > received at the same time. > > Interestingly only one of the 3 kafka sources in the job seems to be > affected (last time I checked at least). We are still using the 0.8 > consumer with commit on checkpoints. Also I dont see this happen in > other jobs. > > Any clue on what might cause this? > > Thanks :) > Gyula -- Urs Schönenberger - urs.schoenenber...@tngtech.com TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082