becketqin opened a new pull request, #26205:
URL: https://github.com/apache/flink/pull/26205

   ## What is the purpose of the change
   This is a re-commit of the patch ( #25569 ) after fixing the accidentally 
changed timeout unit of the SplitFetcherManager close timeout.
   
   ## Brief change log
   
   Changes in the Original Fix:
   
   This patch fixes an issue introduced in 
https://github.com/apache/flink/pull/25130. In SplitFetcherManager.close(), the 
element queue draining thread was chaining the runnables to the element queue 
availability future in a tight loop. This causes problem (e.g. OOM, high CPU 
util) when the fetcher threads do not shutdown quickly.
   
   This patch changes the tight async loop to a blocking loop.
   
   Changes in the followup patch fixes the accidentally changed timeout unit of 
the SplitFetcherManager close timeout.
   
   ## Verifying this change
   Unit tests have been added.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (No)
     - If yes, how is the feature documented? (No)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to