featzhang opened a new pull request, #27711:
URL: https://github.com/apache/flink/pull/27711

   ## What is the purpose of the change
   
   This PR implements the integration of NodeHealthManager with the slot 
allocation process in FineGrainedSlotManager. It enables filtering out 
quarantined nodes during slot allocation to prevent jobs from being scheduled 
on unhealthy nodes.
   
   This PR builds on PR #27701 (NodeHealthManager abstraction) and implements 
Phase 2 of the node health management mechanism.
   
   ## Brief change log
   
   - Modified FineGrainedSlotManager to filter out quarantined nodes during 
slot allocation in allocateSlotsAccordingTo() method
   - Updated ResourceManagerRuntimeServices to accept NodeHealthManager 
parameter in createSlotManager() method
   - Enhanced ResourceManagerFactory to pass NoOpNodeHealthManager as default 
implementation
   - Extended FineGrainedSlotManagerBuilder and FineGrainedSlotManagerTestBase 
to support NodeHealthManager in test infrastructure
   - Added comprehensive integration test NodeQuarantineSlotFilteringITCase 
covering slot allocation filtering, quarantine expiry, and manual removal 
scenarios
   - Fixed compilation issues in test infrastructure related to method name 
conflicts
   
   ## Verifying this change
   
   This change is verified by:
   - Existing unit tests continue to pass
   - New integration test NodeQuarantineSlotFilteringITCase validates the slot 
filtering functionality
   - Manual testing with quarantined nodes shows slots are correctly filtered
   - Compilation succeeds with mvnw clean spotless:apply install -DskipTests 
-Pfast
   
   ## Does this pull request potentially affect
   
   - Public API: No
   - Serializers: No  
   - The runtime per-record code paths: No
   - Anything that affects deployment or recovery: JobManager failover: No
   - The S3 file system connector: No
   
   ## Documentation
   
   - Does this pull request introduce a new feature: Yes, node health-based 
slot filtering
   - If yes, how is the feature documented: Code comments and integration 
tests. Full documentation will be added in subsequent PRs for REST API and 
configuration options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to