[ https://issues.apache.org/jira/browse/FLINK-33481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hansonhe updated FLINK-33481: ----------------------------- Description: FlinkVersion: 1.13.5 , (1) flink-conf.yaml high-availability.zookeeper.path.root /flink high-availability.zookeeper.quorum xxxxx (2) jobmanager application_1684323088373_1744 appattempt_1684323088373_1744_000001 Tue Oct 31 11:19:07 +0800 2023 appattempt_1684323088373_1744_000002 Sat Nov 4 11:10:52 +0800 2023 (3) When appattempt_1684323088373_1744_000001 failures, I found checkpoint stored in zookeper: /flink/application_1684323088373_1744 was deleted the logs as following: !image-2023-11-08-09-40-59-889.png! (2) After appattempt_1684323088373_1744_000001 failures, jobmanager switch to start appattempt_1684323088373_1744_000002, the logs start as following: No checkpoint found during restore !image-2023-11-08-09-57-17-739.png! My Question:Why were checkpoints stored on zookeeper deleted when JobManager failures with Flink High Availability on yarn?It cause that Jobmanager run to restore without checkpoint found was: FlinkVersion: 1.13.5 , (1) flink-conf.yaml high-availability.zookeeper.path.root /flink high-availability.zookeeper.quorum xxxxx (2) jobmanager application_1684323088373_1744 appattempt_1684323088373_1744_000001 Tue Oct 31 11:19:07 +0800 2023 appattempt_1684323088373_1744_000002 Sat Nov 4 11:10:52 +0800 2023 (3) When appattempt_1684323088373_1744_000001 failures, I found checkpoint stored in zookeper: /flink/application_1684323088373_1744 was deleted the logs as following: !image-2023-11-08-09-40-59-889.png! (2) After appattempt_1684323088373_1744_000001 failures, jobmanager switch to start appattempt_1684323088373_1744_000002, the logs start as following: No checkpoint found during restore !image-2023-11-08-09-57-17-739.png! My Question:Why > Why were checkpoints stored on zookeeper deleted when JobManager failures > with Flink High Availability on yarn > -------------------------------------------------------------------------------------------------------------- > > Key: FLINK-33481 > URL: https://issues.apache.org/jira/browse/FLINK-33481 > Project: Flink > Issue Type: Bug > Reporter: hansonhe > Priority: Major > Attachments: image-2023-11-08-09-40-59-889.png, > image-2023-11-08-09-57-17-739.png > > > FlinkVersion: 1.13.5 , > (1) flink-conf.yaml > high-availability.zookeeper.path.root /flink > high-availability.zookeeper.quorum xxxxx > (2) jobmanager > application_1684323088373_1744 > appattempt_1684323088373_1744_000001 Tue Oct 31 11:19:07 +0800 2023 > appattempt_1684323088373_1744_000002 Sat Nov 4 11:10:52 +0800 2023 > (3) When appattempt_1684323088373_1744_000001 failures, I found checkpoint > stored in zookeper: /flink/application_1684323088373_1744 was deleted > the logs as following: > !image-2023-11-08-09-40-59-889.png! > (2) After appattempt_1684323088373_1744_000001 failures, jobmanager switch > to start appattempt_1684323088373_1744_000002, the logs start as following: > No checkpoint found during restore !image-2023-11-08-09-57-17-739.png! > My Question:Why were checkpoints stored on zookeeper deleted when JobManager > failures with Flink High Availability on yarn?It cause that Jobmanager run > to restore without checkpoint found -- This message was sent by Atlassian Jira (v8.20.10#820010)