becketqin opened a new pull request, #26205: URL: https://github.com/apache/flink/pull/26205
## What is the purpose of the change This is a re-commit of the patch ( #25569 ) after fixing the accidentally changed timeout unit of the SplitFetcherManager close timeout. ## Brief change log Changes in the Original Fix: This patch fixes an issue introduced in https://github.com/apache/flink/pull/25130. In SplitFetcherManager.close(), the element queue draining thread was chaining the runnables to the element queue availability future in a tight loop. This causes problem (e.g. OOM, high CPU util) when the fetcher threads do not shutdown quickly. This patch changes the tight async loop to a blocking loop. Changes in the followup patch fixes the accidentally changed timeout unit of the SplitFetcherManager close timeout. ## Verifying this change Unit tests have been added. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (No) - If yes, how is the feature documented? (No) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org