Mit Desai created YUNIKORN-3118:
-----------------------------------
Summary: Implement Parallel TryNode Evaluation for Improved
Scheduling Performance
Key: YUNIKORN-3118
URL: https://issues.apache.org/jira/browse/YUNIKORN-3118
Project: Apache YuniKorn
Issue Type: Improvement
Components: core - scheduler
Reporter: Mit Desai
Assignee: Mit Desai
h3. Summary
Implement parallel evaluation of nodes during the scheduling process to
significantly improve scheduling latency in large clusters. This enhancement
introduces configurable parallelization of the TryNode evaluation process while
maintaining backward compatibility.
h3. Background
In large Kubernetes clusters with many nodes, the current sequential node
evaluation process can become a bottleneck during scheduling. Each allocation
request must evaluate nodes one by one, leading to increased scheduling
latency, especially when dealing with multiple pending pods.
h3. Proposed Solution
Add a new configuration parameter `trynodesthreadcount` that allows us to
configure the number of parallel threads used for node evaluation during
scheduling.
h3. Key Features:
1. {*}Configurable Parallelism{*}: New `trynodesthreadcount` parameter in
partition configuration
2. {*}Backward Compatibility{*}: Defaults to sequential behavior (value = 1)
when not configured
3. {*}Thread Safety{*}: Proper synchronization using goroutines and semaphores
4. {*}Performance Optimization{*}: Implements dry-run evaluation before actual
allocation attempts
Configuration Example:
{code:yaml}
partitions:
name: default
trynodesthreadcount: 20 # Enable parallel evaluation with 20 threads
queues: name: root
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]