[jira] [Updated] (FLINK-33481) Why were checkpoints stored on zookeeper deleted when JobManager failures with Flink High Availability on yarn

hansonhe (Jira) Tue, 07 Nov 2023 18:03:05 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-33481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


hansonhe updated FLINK-33481:
-----------------------------
    Description: 
FlinkVersion:  1.13.5 , 
(1) flink-conf.yaml 

high-availability.zookeeper.path.root    /flink
high-availability.zookeeper.quorum   xxxxx
(2) jobmanager

application_1684323088373_1744
appattempt_1684323088373_1744_000001    Tue Oct 31 11:19:07 +0800 2023
appattempt_1684323088373_1744_000002    Sat Nov 4 11:10:52 +0800 2023

(3) When appattempt_1684323088373_1744_000001  failures, I found checkpoint 
stored in zookeper: /flink/application_1684323088373_1744 was deleted

the logs as following: 
!image-2023-11-08-09-40-59-889.png!

(2) After appattempt_1684323088373_1744_000001  failures, jobmanager switch to 
start appattempt_1684323088373_1744_000002, the logs start as following:   No 
checkpoint found during restore  !image-2023-11-08-09-57-17-739.png!

My Question：Why were checkpoints stored on zookeeper deleted when JobManager 
failures with Flink High Availability on yarn？It cause that  Jobmanager run to 
restore  without checkpoint found

  was:
FlinkVersion:  1.13.5 , 
(1) flink-conf.yaml 

high-availability.zookeeper.path.root    /flink
high-availability.zookeeper.quorum   xxxxx
(2) jobmanager

application_1684323088373_1744
appattempt_1684323088373_1744_000001    Tue Oct 31 11:19:07 +0800 2023
appattempt_1684323088373_1744_000002    Sat Nov 4 11:10:52 +0800 2023


(3) When appattempt_1684323088373_1744_000001  failures, I found checkpoint 
stored in zookeper: /flink/application_1684323088373_1744 was deleted

the logs as following: 
!image-2023-11-08-09-40-59-889.png!

(2) After appattempt_1684323088373_1744_000001  failures, jobmanager switch to 
start appattempt_1684323088373_1744_000002, the logs start as following:   No 
checkpoint found during restore !image-2023-11-08-09-57-17-739.png!

My Question：Why 


> Why were checkpoints stored on zookeeper deleted when JobManager failures 
> with Flink High Availability on yarn
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33481
>                 URL: https://issues.apache.org/jira/browse/FLINK-33481
>             Project: Flink
>          Issue Type: Bug
>            Reporter: hansonhe
>            Priority: Major
>         Attachments: image-2023-11-08-09-40-59-889.png, 
> image-2023-11-08-09-57-17-739.png
>
>
> FlinkVersion:  1.13.5 , 
> (1) flink-conf.yaml 
> high-availability.zookeeper.path.root    /flink
> high-availability.zookeeper.quorum   xxxxx
> (2) jobmanager
> application_1684323088373_1744
> appattempt_1684323088373_1744_000001    Tue Oct 31 11:19:07 +0800 2023
> appattempt_1684323088373_1744_000002    Sat Nov 4 11:10:52 +0800 2023
> (3) When appattempt_1684323088373_1744_000001  failures, I found checkpoint 
> stored in zookeper: /flink/application_1684323088373_1744 was deleted
> the logs as following: 
> !image-2023-11-08-09-40-59-889.png!
> (2) After appattempt_1684323088373_1744_000001  failures, jobmanager switch 
> to start appattempt_1684323088373_1744_000002, the logs start as following:   
> No checkpoint found during restore  !image-2023-11-08-09-57-17-739.png!
> My Question：Why were checkpoints stored on zookeeper deleted when JobManager 
> failures with Flink High Availability on yarn？It cause that  Jobmanager run 
> to restore  without checkpoint found



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33481) Why were checkpoints stored on zookeeper deleted when JobManager failures with Flink High Availability on yarn

Reply via email to