This is an automated email from the ASF dual-hosted git repository.
mani pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git
The following commit(s) were added to refs/heads/master by this push:
new ceb376d3b0 [YUNIKORN-2761] Explain preemption storm in usage doc (#457)
ceb376d3b0 is described below
commit ceb376d3b00e805fb937ea00dcf50a2ea090e755
Author: Manikandan R <[email protected]>
AuthorDate: Wed Jul 24 11:50:00 2024 +0530
[YUNIKORN-2761] Explain preemption storm in usage doc (#457)
Closes: #457
Signed-off-by: Manikandan R <[email protected]>
---
docs/assets/preemption_quota_redistribution.png | Bin 0 -> 204788 bytes
docs/assets/preemption_storm.png | Bin 0 -> 238038 bytes
docs/user_guide/preemption.md | 53 +++++++++++++++++++++++-
3 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/docs/assets/preemption_quota_redistribution.png
b/docs/assets/preemption_quota_redistribution.png
new file mode 100644
index 0000000000..28ac5bb92c
Binary files /dev/null and b/docs/assets/preemption_quota_redistribution.png
differ
diff --git a/docs/assets/preemption_storm.png b/docs/assets/preemption_storm.png
new file mode 100644
index 0000000000..079977b1f5
Binary files /dev/null and b/docs/assets/preemption_storm.png differ
diff --git a/docs/user_guide/preemption.md b/docs/user_guide/preemption.md
index 019646fc66..dc204f7e57 100644
--- a/docs/user_guide/preemption.md
+++ b/docs/user_guide/preemption.md
@@ -228,4 +228,55 @@ In this example, two imbalances are observed:
| `rt.ten-a.queue-2` | 0 | 0 |
| `rt.ten-b` | 15 | 10 |
| `rt.ten-b.queue-3` | 15 | 10 |
-| `rt.sys` | 0 | 10 |
\ No newline at end of file
+| `rt.sys` | 0 | 10 |
+
+### Redistribution of Quota and Preemption Storm
+
+#### Redistribution of Quota
+
+Setting up guaranteed resources for the queue present at a higher level in the
queue hierarchy helps to re-distribute the quota among different groups.
Especially when workloads of the same priority run in different groups, unlike
the default scheduler, YuniKorn preempts workloads of the same priority to free
up resources for pending workloads who deserve to get the resources as per the
queues guaranteed quota. At times, one needs this kind of queue setup in a real
production cluster for [...]
+
+For example, root.region[1-N].country[1-N].state[1-N]
+
+
+
+This queue setup has N regions under “root”, each region has N countries. If
administrators want to redistribute the workloads of the same priority among
different regions, then it is better to define the guaranteed quota for each
region so that preemption helps to reach the situation of running the workloads
by redistribution based on the guaranteed quota each region is supposed to get.
That way each region uses the resources it deserves to get at the maximum
possible level from the ove [...]
+
+#### Preemption Storm
+
+With a setup like above, there is a side effect of increasing the chance of a
preemption storm or loop happening within the specific region between different
state queues (siblings belonging to same parent).
+
+ReplicaSets are a good example to look at for looping and circular preemption.
Each time a pod from a replica set is removed the ReplicaSet controller will
create a new pod to make sure the set is complete. That auto-recreation could
trigger loops as described below.
+
+
+
+State of the queues:
+
+#### `Region1`
+
+* Guaranteed: vcores = 10
+* Usage: vcores = 8
+* Under guaranteed: usage < guaranteed, starving
+
+#### `State1`
+
+* Guaranteed: nil
+* A ReplicaSet is submitted to queue and requesting 9 replicas, with each
replica requiring `{vcores: 1}`.
+* 4 replicas are running. Usage: vcores = 4
+* 5 replicas are waiting for resources. Pending: vcores = 5
+* Inherits "under guaranteed" behaviour from `Region1`, eligible to trigger
preemption
+
+#### `State2`
+
+* Guaranteed: nil
+* A ReplicaSet is submitted to queue and requesting 9 replicas, with each
replica requiring `{vcores: 1}`.
+* 4 replicas are running. Usage: vcores = 4
+* 5 replicas are waiting for resources. Pending: vcores = 5
+* Inherits "under guaranteed" behaviour from `Region1`, eligible to trigger
preemption
+
+Replica set `State1 Repl` runs in queue `State1`. Replica set `State2 Repl`
runs in the queue `State2`. Both queues belong to the same parent queue (they
are siblings), `Country1`. The pods all run with the same settings for priority
and preemption. There is no space left on the cluster. Both region, `Region1`
and country, `Country1` queue usage is `{vcores:8}`. Since `Region1` has a
guaranteed quota of `{vcores:10}` and usage of `{vcores:8}` lower than its
guaranteed quota leading to st [...]
+
+Let's say, `state1` triggers preemption to meet resource requirements for
pending pods.
+To make room for a `State1 Repl` pod, a pod from the `State2 Repl` set is
preempted. Now, the pending `State1 Repl` pod moves from pending to running.
Now, the next scheduling cycle comes. Let's say, `State2` triggers preemption
to meet resource requirements for its pending pods. In addition to already
existing pending pods, pod preempted (killed) in earlier scheduling cycles
would have been recreated automatically by this time as it is a replica set. To
make room for a `State2 Repl` pod [...]
+
+Defining guaranteed resources at queues at lower level or at end leaf queues
can avoid the preemption storm or loop from happening in the cluster.
Administrators should be aware of the side effects of setting up guaranteed
resources at any specific location in the queue hierarchy to reap the best
possible outcomes of the preemption process.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]