kylemeow opened a new pull request #11639: [FLINK-16626][runtime] Prevent REST handler from being closed more than once URL: https://github.com/apache/flink/pull/11639 ## What is the purpose of the change In Flink 1.10.0 release, job cancellation can be problematic, as users frequently experience *java.util.concurrent.TimeoutException* at the client side, because the REST endpoint closes pre-maturely before sending out the response. After discussion with the community and research, it is shown that there are two issues to address: 1. AbstractHandler and its subclasses can be closed more than once (whether intentionally or unintentionally), so this might lead to unexpected behavior like exceptions, especially when interacting with external systems, or unintended deregistration of Phaser in the handler instance which causes early shutdown of the cluster. 2. In WebMonitorEndpoint class, the same jobCancelTerminationHandler instance has been registered twice, thus during handler closure process, *closeAsync* method is called twice, therefore, the cluster pre-maturely entered internalShutdown process, leaving unfinished responses behind. ## Brief change log - Added an AtomicBoolean field to prevent closeAsync method of one handler instance from being called multiple times. - Added a new legacyJobCancelTerminationHandler to prevent reuse of existing jobCancelTerminationHandler handler instance. ## Verifying this change This change added tests and can be verified as follows: YARNJobCancellationITCase ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: yes - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services