Hi, I have noticed a strange behavior in one of our jobs: every once in a while the Kafka source checkpointing time becomes extremely large compared to what it usually is. (To be very specific it is a kafka source chained with a stateless map operator)
To be more specific checkpointing the offsets usually takes around 10ms which sounds reasonable but in some checkpoints this goes into the 3-5 minutes range practically blocking the job for that period of time. Yesterday I have observed even 10 minute delays. First I thought that some sources might trigger checkpoints later than others, but adding some logging and comparing it it seems that the triggerCheckpoint was received at the same time. Interestingly only one of the 3 kafka sources in the job seems to be affected (last time I checked at least). We are still using the 0.8 consumer with commit on checkpoints. Also I dont see this happen in other jobs. Any clue on what might cause this? Thanks :) Gyula