(yunikorn-site) branch master updated: [YUNIKORN-2761] Explain preemption storm in usage doc (#457)

mani Tue, 23 Jul 2024 23:21:14 -0700

This is an automated email from the ASF dual-hosted git repository.

mani pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git



The following commit(s) were added to refs/heads/master by this push:
     new ceb376d3b0 [YUNIKORN-2761] Explain preemption storm in usage doc (#457)
ceb376d3b0 is described below

commit ceb376d3b00e805fb937ea00dcf50a2ea090e755
Author: Manikandan R <[email protected]>
AuthorDate: Wed Jul 24 11:50:00 2024 +0530

    [YUNIKORN-2761] Explain preemption storm in usage doc (#457)
    
    Closes: #457
    
    Signed-off-by: Manikandan R <[email protected]>
---
 docs/assets/preemption_quota_redistribution.png | Bin 0 -> 204788 bytes
 docs/assets/preemption_storm.png                | Bin 0 -> 238038 bytes
 docs/user_guide/preemption.md                   |  53 +++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/docs/assets/preemption_quota_redistribution.png 
b/docs/assets/preemption_quota_redistribution.png
new file mode 100644
index 0000000000..28ac5bb92c
Binary files /dev/null and b/docs/assets/preemption_quota_redistribution.png 
differ
diff --git a/docs/assets/preemption_storm.png b/docs/assets/preemption_storm.png
new file mode 100644
index 0000000000..079977b1f5
Binary files /dev/null and b/docs/assets/preemption_storm.png differ
diff --git a/docs/user_guide/preemption.md b/docs/user_guide/preemption.md
index 019646fc66..dc204f7e57 100644
--- a/docs/user_guide/preemption.md
+++ b/docs/user_guide/preemption.md
@@ -228,4 +228,55 @@ In this example, two imbalances are observed:
 | `rt.ten-a.queue-2` | 0                          | 0                         |
 | `rt.ten-b`         | 15                         | 10                        |
 | `rt.ten-b.queue-3` | 15                         | 10                        |
-| `rt.sys`           | 0                          | 10                        |
\ No newline at end of file
+| `rt.sys`           | 0                          | 10                        |
+
+### Redistribution of Quota and Preemption Storm
+
+#### Redistribution of Quota
+
+Setting up guaranteed resources for the queue present at a higher level in the 
queue hierarchy helps to re-distribute the quota among different groups. 
Especially when workloads of the same priority run in different groups, unlike 
the default scheduler, YuniKorn preempts workloads of the same priority to free 
up resources for pending workloads who deserve to get the resources as per the 
queues guaranteed quota. At times, one needs this kind of queue setup in a real 
production cluster for [...]
+
+For example, root.region[1-N].country[1-N].state[1-N]
+
+![preemption_quota_redistribution](../assets/preemption_quota_redistribution.png)
+
+This queue setup has N regions under “root”, each region has N countries. If 
administrators want to redistribute the workloads of the same priority among 
different regions, then it is better to define the guaranteed quota for each 
region so that preemption helps to reach the situation of running the workloads 
by redistribution based on the guaranteed quota each region is supposed to get. 
That way each region uses the resources it deserves to get at the maximum 
possible level from the ove [...]
+
+#### Preemption Storm
+
+With a setup like above, there is a side effect of increasing the chance of a 
preemption storm or loop happening within the specific region between different 
state queues (siblings belonging to same parent).
+
+ReplicaSets are a good example to look at for looping and circular preemption. 
Each time a pod from a replica set is removed the ReplicaSet controller will 
create a new pod to make sure the set is complete. That auto-recreation could 
trigger loops as described below.
+
+![preemption_storm](../assets/preemption_storm.png)
+
+State of the queues:
+
+#### `Region1`
+
+* Guaranteed: vcores = 10
+* Usage: vcores = 8
+* Under guaranteed: usage < guaranteed, starving
+
+#### `State1`
+
+* Guaranteed: nil
+* A ReplicaSet is submitted to queue and requesting 9 replicas, with each 
replica requiring `{vcores: 1}`.
+* 4 replicas are running. Usage: vcores = 4
+* 5 replicas are waiting for resources. Pending: vcores = 5
+* Inherits "under guaranteed" behaviour from `Region1`, eligible to trigger 
preemption
+
+#### `State2`
+
+* Guaranteed: nil
+* A ReplicaSet is submitted to queue and requesting 9 replicas, with each 
replica requiring `{vcores: 1}`.
+* 4 replicas are running. Usage: vcores = 4
+* 5 replicas are waiting for resources. Pending: vcores = 5
+* Inherits "under guaranteed" behaviour from `Region1`, eligible to trigger 
preemption
+
+Replica set `State1 Repl` runs in queue `State1`. Replica set `State2 Repl` 
runs in the queue `State2`. Both queues belong to the same parent queue (they 
are siblings), `Country1`. The pods all run with the same settings for priority 
and preemption. There is no space left on the cluster. Both region, `Region1` 
and country, `Country1` queue usage is `{vcores:8}`. Since `Region1` has a 
guaranteed quota of `{vcores:10}` and usage of `{vcores:8}` lower than its 
guaranteed quota leading to st [...]
+
+Let's say, `state1` triggers preemption to meet resource requirements for 
pending pods.
+To make room for a `State1 Repl` pod, a pod from the `State2 Repl` set is 
preempted. Now, the pending `State1 Repl` pod moves from pending to running. 
Now, the next scheduling cycle comes. Let's say, `State2` triggers preemption 
to meet resource requirements for its pending pods. In addition to already 
existing pending pods, pod preempted (killed) in earlier scheduling cycles 
would have been recreated automatically by this time as it is a replica set. To 
make room for a `State2 Repl` pod [...]
+
+Defining guaranteed resources at queues at lower level or at end leaf queues 
can avoid the preemption storm or loop from happening in the cluster. 
Administrators should be aware of the side effects of setting up guaranteed 
resources at any specific location in the queue hierarchy to reap the best 
possible outcomes of the preemption process.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(yunikorn-site) branch master updated: [YUNIKORN-2761] Explain preemption storm in usage doc (#457)

Reply via email to