XComp opened a new pull request #18921: URL: https://github.com/apache/flink/pull/18921
This serves as a 1.14 backport for PR #18919 The cherry-pick only caused some minor import conflict due to the upgrade from curator4 to curator5 Description copied: ## What is the purpose of the change This inconsistency was introduced by `c3a6b51` as part of changes done for FLINK-19543. It should be, anyway, fixed for consistency reasons. This issue never appeared because of the following reasons: * getAllAndLock is only called by the CompletedCheckpoint recovery which happens during the recovery of a job (i.e. after failover) * the removal happens through releaseAndTryRemove which can be called in the following places: * through DefaultCompletedCheckpointStore.addCheckpointAndSubsumeOldestOne That happens after a job is already recovered and running * through DefaultCompletedCheckpointStore.shutdown ...where it's only called if the job reached a globally-terminal state (i.e. it's not subject to job recovery) * through DefaultJobGraphStore.globalCleanupAsync ...which is also only called on jobs that reached a globally-terminal state (i.e. it's not subject to job recovery) * ZooKeeperStateHandleStore.releaseAndTryRemoveAll ...which seems to be legacy code which is not used anywhere in production, anymore. I'm leaving it here because it might make sense to remove ZooKeeper completely, anyway. ## Brief change log * Refactored `getAllAndLock` to make it possible to test concurrent node changes * Fixes the catch statement replacing `KeeperException.NoNodeException` by `NotExistException` which was introduced in FLINK-19543 ## Verifying this change * A test was added to verify the change ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org