Hi Piotr,
Thanks a lot.
I will try your suggestion to see what happen.
Regards,
Zhinan
On Fri, 21 Aug 2020 at 00:40, Piotr Nowojski wrote:
>
> Hi Zhinan,
>
> It's hard to say, but my guess it takes that long for the tasks to respond to
> cancellation which consists of a couple of steps. If a t
Hi Zhinan,
It's hard to say, but my guess it takes that long for the tasks to respond
to cancellation which consists of a couple of steps. If a task is currently
busy processing something, it has to respond to interruption
(`java.lang.Thread#interrupt`). If it takes 30 seconds for a task to react
Hi Piotr,
Thanks a lot for your help.
Yes, I finally realize that I can only approximate the time for [1]
and [3] and measure [2] by monitoring the uptime and downtime metric
provided by Flink.
And now my problem is that I found the time in [2] can be up to 40s, I
wonder why it takes so long to r
Hi,
> I want to decompose the recovery time into different parts, say
> (1) the time to detect the failure,
> (2) the time to restart the job,
> (3) and the time to restore the checkpointing.
1. Maybe I'm missing something, but as far as I can tell, Flink can not
help you with that. Time to detec