wojski
Sent: Tuesday, March 23, 2021 5:31 AM
To: Alexey Trenikhun
Cc: Arvid Heise ; ChangZhuo Chen (陳昌倬) ;
ro...@apache.org ; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Hi Alexey,
You should definitely investigate why the job is stuck.
1. First of all, is it completely s
uring next performance run.
Thanks,
Alexey
From: Roman Khachatryan
Sent: Tuesday, March 23, 2021 12:17 AM
To: Alexey Trenikhun
Cc: ChangZhuo Chen (陳昌倬) ; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Unfortunately, the lock can't be chang
...@apache.org <
> ro...@apache.org>; Flink User Mail List
> *Subject:* Re: Checkpoint fail due to timeout
>
> Hi Alexey,
>
> rescaling from unaligned checkpoints will be supported with the upcoming
> 1.13 release (expected at the end of April).
>
> Best,
>
>
> Thanks,
> Alexey
> ____
> From: Roman Khachatryan
> Sent: Monday, March 22, 2021 1:36 AM
> To: ChangZhuo Chen (陳昌倬)
> Cc: Alexey Trenikhun ; Flink User Mail List
>
> Subject: Re: Checkpoint fail due to timeout
>
> Thanks for sharin
hread.run(SourceStreamTask.java:263)
Thanks,
Alexey
From: Roman Khachatryan
Sent: Monday, March 22, 2021 1:36 AM
To: ChangZhuo Chen (陳昌倬)
Cc: Alexey Trenikhun ; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Thanks for sharing the thread dump.
It
: ChangZhuo Chen (陳昌倬)
Cc: Alexey Trenikhun ; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Thanks for sharing the thread dump.
It shows that the source thread is indeed back-pressured
(checkpoint lock is held by a thread which is trying to emit but
unable to acquire any free buffers
checkpoint
still times out after 3hr.
From: Arvid Heise
Sent: Monday, March 22, 2021 6:58:20 AM
To: ChangZhuo Chen (陳昌倬)
Cc: Alexey Trenikhun ; ro...@apache.org ;
Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Hi Alexey,
rescaling from
Hi Alexey,
rescaling from unaligned checkpoints will be supported with the upcoming
1.13 release (expected at the end of April).
Best,
Arvid
On Wed, Mar 17, 2021 at 8:29 AM ChangZhuo Chen (陳昌倬)
wrote:
> On Wed, Mar 17, 2021 at 05:45:38AM +, Alexey Trenikhun wrote:
> > In my opinion looks
Thanks for sharing the thread dump.
It shows that the source thread is indeed back-pressured
(checkpoint lock is held by a thread which is trying to emit but
unable to acquire any free buffers).
The lock is per task, so there can be several locks per TM.
@ChangZhuo Chen (陳昌倬) , in the thread you
("hdfs:///checkpoints-data/"));
Difference to Savepoints
ci.apache.org
From: ChangZhuo Chen (陳昌倬)
Sent: Wednesday, March 17, 2021 12:29 AM
To: Alexey Trenikhun
Cc: ro...@apache.org; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
On Wed, Ma
On Wed, Mar 17, 2021 at 05:45:38AM +, Alexey Trenikhun wrote:
> In my opinion looks similar. Were you able to tune-up Flink to make it work?
> I'm stuck with it, I wanted to scale up hoping to reduce backpressure, but to
> rescale I need to take savepoint, which never completes (at least take
From: ChangZhuo Chen (陳昌倬)
Sent: Tuesday, March 16, 2021 6:59 AM
To: Alexey Trenikhun
Cc: ro...@apache.org; Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
On Tue, Mar 16, 2021 at 02:32:54AM +, Alexey Trenikhun wrote:
> Hi Roman,
> I took thread dump:
> "Source:
On Tue, Mar 16, 2021 at 02:32:54AM +, Alexey Trenikhun wrote:
> Hi Roman,
> I took thread dump:
> "Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=200 BLOCKED on
> java.lang.Object@5366a0e2 owned by "Legacy Source Thread - Source:
> digital-itx-eastus2 -> Filter (6/6)#0" Id=202
> at
>
k or per TM? I see multiple
threads in SynchronizedStreamTaskActionExecutor.runThrowing blocked on
different Objects.
Thanks,
Alexey
From: Roman Khachatryan
Sent: Monday, March 15, 2021 2:16 AM
To: Alexey Trenikhun
Cc: Flink User Mail List
Subject: Re: Checkpoint fail due to timeout
Hello Alexey,
2.2 with same results
>
> Thanks,
> Alexey
>
> From: Roman Khachatryan
> Sent: Thursday, March 11, 2021 11:49 PM
> To: Alexey Trenikhun
> Cc: Flink User Mail List
> Subject: Re: Checkpoint fail due to timeout
>
> Hello,
>
>
Hello,
This can be caused by several reasons such as back-pressure, large
snapshots or bugs.
Could you please share:
- the stats of the previous (successful) checkpoints
- back-pressure metrics for sources
- which Flink version do you use?
Regards,
Roman
On Thu, Mar 11, 2021 at 7:03 AM Alexey
16 matches
Mail list logo