Roman Shakhov created YUNIKORN-3113:
---------------------------------------
Summary: Resource-wise preemption
Key: YUNIKORN-3113
URL: https://issues.apache.org/jira/browse/YUNIKORN-3113
Project: Apache YuniKorn
Issue Type: Improvement
Components: core - scheduler
Reporter: Roman Shakhov
I have a usecase to manage ephemeral-storage resource type along with vcore and
memory. Some workloads might use storage and some don't. So I want to have
ephemeral-storage set as max and guaranteed.
When having heterogeneous workloads, preemption constantly is not triggered
even if it could help balance out queues.
*Case #1* [unit
test|https://github.com/blide/yunikorn-core/commit/6c4f12250c26c908621a18e75a0ca96eb8d1778f#diff-7b65cc904d1c0a0395b409e51db43bfe65238432eb96b66831c950060feac911R1833-R1834]
For example, with given config and queues state:
* workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
** a - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
*** allocated: 10 vcore, 80 memory
** b - guaranteed: 5 vcore, 40 memory
Submitting any jobs to {{workloads.b}} won't trigger preemption from
{{{}workloads.a{}}}, even though {{workloads.a}} is heavily over-allocated in a
common sense.
*Case #2*
There is a symmetrical case when {{workloads.b}} has allocated {{storage}}
resource:
* workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
** a - guaranteed: 5 vcore, 40 memory
*** allocated: 9 vcore, 72 memory
** b - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
*** allocated: 1 vcore, 8 memory, 401 ephemeral-storage
Submitting pod (1 vcore, 8 memory) to {{workloads.b}} won't trigger preemption
from {{{}workloads.a{}}}, even though new allocation doesn't use over-allocated
resource.
In cases 1 and 2 preemption might not consider storage resource at all but take
into account only resources presented in ask allocation. It wouldn't make
guaranteed resources distribution farther from the optimum.
*Case #3*
Using a pod (1 vcore, 8 memory, {*}99 ephemeral-storage{*}) in case#2. It would
make sense to preempt vcore and memory to schedule the pod if there is enough
headroom for 99 ephemeral-storage.
*General approach*
Preemptor should only consider those resources that were presented either in
allocation or in preemption.
Currently under-allocated/over-allocated condition checks queue as a whole:
-
[preemption.go#L201|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L201]
-
[preemption.go#L255|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L255]
-
[preemption.go#L325|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L325]
-
[preemption.go#L482|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L482]
-
[preemption.go#L513|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L513]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]