Re: Why would a kafka source checkpoint take so long?

Gyula Fóra Wed, 12 Jul 2017 06:49:18 -0700

I have added logging that will help determine this as well, next time this
happens I will post the results. (Although there doesnt seem to be high
backpressure)


Thanks for the tips,
Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2017. júl. 12., Sze,
15:27):

> Can it be that the checkpoint thread is waiting to grab the lock, which is
> held by the chain under backpressure?
>
> On Wed, Jul 12, 2017 at 12:23 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> Yes thats definitely what I am about to do next but just thought maybe
>> someone has seen this before.
>>
>> Will post info next time it happens. (Not guaranteed to happen soon as it
>> didn't happen for a long time before)
>>
>> Gyula
>>
>> On Wed, Jul 12, 2017, 12:13 Stefan Richter <s.rich...@data-artisans.com>
>> wrote:
>>
>>> Hi,
>>>
>>> could you introduce some logging to figure out from which method call
>>> the delay is introduced?
>>>
>>> Best,
>>> Stefan
>>>
>>> Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gyula.f...@gmail.com>:
>>>
>>> Hi,
>>>
>>> We are using the latest 1.3.1
>>>
>>> Gyula
>>>
>>> Urs Schoenenberger <urs.schoenenber...@tngtech.com> ezt írta (időpont:
>>> 2017. júl. 12., Sze, 10:44):
>>>
>>>> Hi Gyula,
>>>>
>>>> I don't know the cause unfortunately, but we observed a similiar issue
>>>> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
>>>> Which version are you running on?
>>>>
>>>> Urs
>>>>
>>>> On 12.07.2017 09:48, Gyula Fóra wrote:
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> while
>>>> > the Kafka source checkpointing time becomes extremely large compared
>>>> to
>>>> > what it usually is. (To be very specific it is a kafka source chained
>>>> with
>>>> > a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> some
>>>> > sources might trigger checkpoints later than others, but adding some
>>>> > logging and comparing it it seems that the triggerCheckpoint was
>>>> received
>>>> > at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> other
>>>> > jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>> >
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > I have noticed a strange behavior in one of our jobs: every once in a
>>>> > while the Kafka source checkpointing time becomes extremely large
>>>> > compared to what it usually is. (To be very specific it is a kafka
>>>> > source chained with a stateless map operator)
>>>> >
>>>> > To be more specific checkpointing the offsets usually takes around
>>>> 10ms
>>>> > which sounds reasonable but in some checkpoints this goes into the 3-5
>>>> > minutes range practically blocking the job for that period of time.
>>>> > Yesterday I have observed even 10 minute delays. First I thought that
>>>> > some sources might trigger checkpoints later than others, but adding
>>>> > some logging and comparing it it seems that the triggerCheckpoint was
>>>> > received at the same time.
>>>> >
>>>> > Interestingly only one of the 3 kafka sources in the job seems to be
>>>> > affected (last time I checked at least). We are still using the 0.8
>>>> > consumer with commit on checkpoints. Also I dont see this happen in
>>>> > other jobs.
>>>> >
>>>> > Any clue on what might cause this?
>>>> >
>>>> > Thanks :)
>>>> > Gyula
>>>>
>>>> --
>>>> Urs Schönenberger - urs.schoenenber...@tngtech.com
>>>>
>>>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>>>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>>>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>>>
>>>
>>>
>

Re: Why would a kafka source checkpoint take so long?

Reply via email to