Re: Flink failing to restore from checkpoint

Piotr Nowojski Mon, 29 Mar 2021 10:29:44 -0700

Hi,

What Flink version are you using and what is the scenario that's happening?
It can be a number of things, most likely an issue that your filed mounted
under:
>
/mnt/checkpoints/5dde50b6e70608c63708cbf979bce4aa/shared/47993871-c7eb-4fec-ae23-207d307c384a
disappeared or stopped being accessible. For example something like this
[1] (this is not a Flink bug).


Have you tried looking for this path manually? Does it exist? Have you
looked in the JobManager/TaskManager logs for all entries that are
referring to this path?

To help you, we would need more information. If it has happened after
taking a savepoint this could be a recently fixed issue [2]. If you are
using SQL (Blink planner) it could be for example this [3].

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-16470
[2] https://issues.apache.org/jira/browse/FLINK-21351
[3] https://issues.apache.org/jira/browse/FLINK-20665


pon., 29 mar 2021 o 14:58 Claude M <claudemur...@gmail.com> napisał(a):

> Hello,
>
> I executed a flink job in a Kubernetes Application cluster w/ four
> taskmanagers.  The job was running fine for several hours but then crashed
> w/ the following exception which seems to be when restoring from a
> checkpoint.    The UI shows the following for the checkpoint counts:
>
> Triggered: 68In Progress: 0Completed: 67Failed: 1Restored: 292
>
>
> Any ideas about this failure?
>
>
> Thanks
>
>
>

Re: Flink failing to restore from checkpoint

Reply via email to