Roman Shakhov created YUNIKORN-3113:
---------------------------------------

             Summary: Resource-wise preemption
                 Key: YUNIKORN-3113
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3113
             Project: Apache YuniKorn
          Issue Type: Improvement
          Components: core - scheduler
            Reporter: Roman Shakhov


I have a usecase to manage ephemeral-storage resource type along with vcore and 
memory. Some workloads might use storage and some don't. So I want to have 
ephemeral-storage set as max and guaranteed.
When having heterogeneous workloads, preemption constantly is not triggered 
even if it could help balance out queues.

*Case #1* [unit 
test|https://github.com/blide/yunikorn-core/commit/6c4f12250c26c908621a18e75a0ca96eb8d1778f#diff-7b65cc904d1c0a0395b409e51db43bfe65238432eb96b66831c950060feac911R1833-R1834]
For example, with given config and queues state:
 * workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
 ** a - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
 *** allocated: 10 vcore, 80 memory
 ** b - guaranteed: 5 vcore, 40 memory

Submitting any jobs to {{workloads.b}} won't trigger preemption from 
{{{}workloads.a{}}}, even though {{workloads.a}} is heavily over-allocated in a 
common sense.

*Case #2*
There is a symmetrical case when {{workloads.b}} has allocated {{storage}} 
resource:
 * workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
 ** a - guaranteed: 5 vcore, 40 memory
 *** allocated: 9 vcore, 72 memory
 ** b - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
 *** allocated: 1 vcore, 8 memory, 401 ephemeral-storage

Submitting pod (1 vcore, 8 memory) to {{workloads.b}} won't trigger preemption 
from {{{}workloads.a{}}}, even though new allocation doesn't use over-allocated 
resource.

In cases 1 and 2 preemption might not consider storage resource at all but take 
into account only resources presented in ask allocation. It wouldn't make 
guaranteed resources distribution farther from the optimum.

*Case #3*
Using a pod (1 vcore, 8 memory, {*}99 ephemeral-storage{*}) in case#2. It would 
make sense to preempt vcore and memory to schedule the pod if there is enough 
headroom for 99 ephemeral-storage.

*General approach*
Preemptor should only consider those resources that were presented either in 
allocation or in preemption.
Currently under-allocated/over-allocated condition checks queue as a whole:
 - 
[preemption.go#L201|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L201]
 - 
[preemption.go#L255|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L255]
 - 
[preemption.go#L325|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L325]
 - 
[preemption.go#L482|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L482]
 - 
[preemption.go#L513|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L513]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to