[jira] [Commented] (YUNIKORN-3113) Resource-wise preemption

Wilfred Spiegelenburg (Jira) Sun, 17 Aug 2025 21:59:28 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014498#comment-18014498
 ]


Wilfred Spiegelenburg commented on YUNIKORN-3113:
-------------------------------------------------

I don't think the issue is in any of the lines you have mentioned.

The queue is most likely skipped when looking for victims as it is considered 
to have a remaining guarantee from itself or the parent "workloads". The skip 
of the queue when looking for victims is based purely on remaining guarantee. 
The remaining guarantee calculation is not taking into account that a resource 
is defined, directly or indirectly, in guarantee but not allocated on the 
queue. That means the queues remaining guarantee is considered larger than zero 
which means a skip.

For some of your cases removing the {{800 ephemeral-storage}} from the parent 
queue should get you a config that works.

That is why adding the guarantee on a queue triggers the issue. The code could 
be a bit smarter and remove any resource types that are not allocated from the 
remaining guarantee. However completely ignoring the values could have a high 
impact on preemption performance. It would remove any filtering, and require 
the code to consider every allocation and the impact of removing it on the full 
queue hierarchy.

copying in [~mani] as he did a lot of the preemption changes with me on 1.6.0

> Resource-wise preemption
> ------------------------
>
>                 Key: YUNIKORN-3113
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3113
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Roman Shakhov
>            Priority: Major
>
> I have a usecase to manage ephemeral-storage resource type along with vcore 
> and memory. Some workloads might use storage and some don't. So I want to 
> have ephemeral-storage set as max and guaranteed.
> When having heterogeneous workloads, preemption constantly is not triggered 
> even if it could help balance out queues.
> *Case #1* [unit 
> test|https://github.com/blide/yunikorn-core/commit/6c4f12250c26c908621a18e75a0ca96eb8d1778f#diff-7b65cc904d1c0a0395b409e51db43bfe65238432eb96b66831c950060feac911R1833-R1834]
> For example, with given config and queues state:
>  * workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
>  ** a - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
>  *** allocated: 10 vcore, 80 memory
>  ** b - guaranteed: 5 vcore, 40 memory
> Submitting any jobs to {{workloads.b}} won't trigger preemption from 
> {{{}workloads.a{}}}, even though {{workloads.a}} is heavily over-allocated in 
> a common sense.
> *Case #2*
> There is a symmetrical case when {{workloads.b}} has allocated {{storage}} 
> resource:
>  * workloads - guaranteed/max: 10 vcore, 80 memory, 800 ephemeral-storage
>  ** a - guaranteed: 5 vcore, 40 memory
>  *** allocated: 9 vcore, 72 memory
>  ** b - guaranteed: 5 vcore, 40 memory, 400 ephemeral-storage
>  *** allocated: 1 vcore, 8 memory, 401 ephemeral-storage
> Submitting pod (1 vcore, 8 memory) to {{workloads.b}} won't trigger 
> preemption from {{{}workloads.a{}}}, even though new allocation doesn't use 
> over-allocated resource.
> In cases 1 and 2 preemption might not consider storage resource at all but 
> take into account only resources presented in ask allocation. It wouldn't 
> make guaranteed resources distribution farther from the optimum.
> *Case #3*
> Using a pod (1 vcore, 8 memory, {*}99 ephemeral-storage{*}) in case#2. It 
> would make sense to preempt vcore and memory to schedule the pod if there is 
> enough headroom for 99 ephemeral-storage.
> *General approach*
> Preemptor should only consider those resources that were presented either in 
> allocation or in preemption.
> Currently under-allocated/over-allocated condition checks queue as a whole:
>  - 
> [preemption.go#L201|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L201]
>  - 
> [preemption.go#L255|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L255]
>  - 
> [preemption.go#L325|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L325]
>  - 
> [preemption.go#L482|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L482]
>  - 
> [preemption.go#L513|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/preemption.go#L513]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-3113) Resource-wise preemption

Reply via email to