Hi, What Flink version are you using and what is the scenario that's happening? It can be a number of things, most likely an issue that your filed mounted under: > /mnt/checkpoints/5dde50b6e70608c63708cbf979bce4aa/shared/47993871-c7eb-4fec-ae23-207d307c384a disappeared or stopped being accessible. For example something like this [1] (this is not a Flink bug).
Have you tried looking for this path manually? Does it exist? Have you looked in the JobManager/TaskManager logs for all entries that are referring to this path? To help you, we would need more information. If it has happened after taking a savepoint this could be a recently fixed issue [2]. If you are using SQL (Blink planner) it could be for example this [3]. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-16470 [2] https://issues.apache.org/jira/browse/FLINK-21351 [3] https://issues.apache.org/jira/browse/FLINK-20665 pon., 29 mar 2021 o 14:58 Claude M <claudemur...@gmail.com> napisaĆ(a): > Hello, > > I executed a flink job in a Kubernetes Application cluster w/ four > taskmanagers. The job was running fine for several hours but then crashed > w/ the following exception which seems to be when restoring from a > checkpoint. The UI shows the following for the checkpoint counts: > > Triggered: 68In Progress: 0Completed: 67Failed: 1Restored: 292 > > > Any ideas about this failure? > > > Thanks > > >