Re: Job recovery issues with state restoration

2021-05-26 Thread Peter Westermann
an Khachatryan > Date: Thursday, May 20, 2021 at 4:54 PM > To: Peter Westermann > Cc: user@flink.apache.org > Subject: Re: Job recovery issues with state restoration > > Hi Peter, > > Do you experience this issue if running without local recovery or > incremental checkpoints e

Re: Job recovery issues with state restoration

2021-05-25 Thread Roman Khachatryan
gt; Date: Thursday, May 20, 2021 at 4:54 PM > To: Peter Westermann > Cc: user@flink.apache.org > Subject: Re: Job recovery issues with state restoration > > Hi Peter, > > Do you experience this issue if running without local recovery or > incremental checkpoints enabled? &

Re: Job recovery issues with state restoration

2021-05-25 Thread Peter Westermann
task local recovery, those would be in a different directory (we have configured io.tmp.dirs as /mnt/data/tmp). Thanks, Peter From: Roman Khachatryan Date: Thursday, May 20, 2021 at 4:54 PM To: Peter Westermann Cc: user@flink.apache.org Subject: Re: Job recovery issues with state restoration

Re: Job recovery issues with state restoration

2021-05-20 Thread Roman Khachatryan
Hi Peter, Do you experience this issue if running without local recovery or incremental checkpoints enabled? Or have you maybe compared local (on TM) and remove (on DFS) SST files? Regards, Roman On Thu, May 20, 2021 at 5:54 PM Peter Westermann wrote: > > Hello, > > > > I’ve reported issues ar

Job recovery issues with state restoration

2021-05-20 Thread Peter Westermann
Hello, I’ve reported issues around checkpoint recovery in case of a job failure due to zookeeper connection loss in the past. I am still seeing issues occasionally. This is for Flink 1.12.3 with zookeeper for HA, S3 as the state backend, incremental checkpoints, and task-local recovery enabled.