Re: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle

Hyman Wed, 08 Dec 2021 07:52:31 -0800



在 2021/12/8 23:36, Hyman 写道:

在 2021/12/6 18:10, Peter Xu 写道:
On Fri, Dec 03, 2021 at 09:39:46AM +0800, huang...@chinatelecom.cn wrote:
+static uint64_t dirtylimit_pct(unsigned int last_pct,
+                               uint64_t quota,
+                               uint64_t current)
+{
+    uint64_t limit_pct = 0;
+    RestrainPolicy policy;
+    bool mitigate = (quota > current) ? true : false;
+
+    if (mitigate && ((current == 0) ||
+        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
+        return 0;
+    }
+
+    policy = dirtylimit_policy(last_pct, quota, current);
+    switch (policy) {
+    case RESTRAIN_SLIGHT:
+        /* [90, 99] */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
+        }
+       break;
+    case RESTRAIN_HEAVY:
+        /* [75, 90) */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
+        }
+       break;
+    case RESTRAIN_RATIO:
+        /* [0, 75) */
+        if (mitigate) {
+            if (last_pct <= (((quota - current) * 100 / quota))) {
+                limit_pct = 0;
+            } else {
+                limit_pct = last_pct -
+                    ((quota - current) * 100 / quota);
+                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
+            }
+        } else {
+            limit_pct = last_pct +
+                ((current - quota) * 100 / current);
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
+        }
+       break;
+    case RESTRAIN_KEEP:
+    default:
+       limit_pct = last_pct;
+       break;
+    }
+
+    return limit_pct;
+}
+
+static void *dirtylimit_thread(void *opaque)
+{
+    int cpu_index = *(int *)opaque;
+    uint64_t quota_dirtyrate, current_dirtyrate;
+    unsigned int last_pct = 0;
+    unsigned int pct = 0;
+
+    rcu_register_thread();
+
+    quota_dirtyrate = dirtylimit_quota(cpu_index);
+    current_dirtyrate = dirtylimit_current(cpu_index);
+
+    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
+
+    do {
+        trace_dirtylimit_impose(cpu_index,
+            quota_dirtyrate, current_dirtyrate, pct);
+
+        last_pct = pct;
+        if (pct == 0) {
+            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
+        } else {
+            dirtylimit_check(cpu_index, pct);
+        }
+
+        quota_dirtyrate = dirtylimit_quota(cpu_index);
+        current_dirtyrate = dirtylimit_current(cpu_index);
+
+ pct = dirtylimit_pct(last_pct, quota_dirtyrate,current_dirtyrate);
So what I had in mind is we can start with an extremely simple version of
negative feedback system. Say, firstly each vcpu will have a simplenumber tosleep for some interval (this is ugly code, but just show what Imeant..):
===============
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eecd8031cf..c320fd190f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2932,6 +2932,8 @@ int kvm_cpu_exec(CPUState *cpu)
              trace_kvm_dirty_ring_full(cpu->cpu_index);
              qemu_mutex_lock_iothread();
              kvm_dirty_ring_reap(kvm_state);
+ if (dirtylimit_enabled(cpu->cpu_index) &&cpu->throttle_us_per_full)
+                usleep(cpu->throttle_us_per_full);
              qemu_mutex_unlock_iothread();
              ret = 0;
              break;
===============
I think this will have finer granularity when throttle (for 4096 ringsize,that's per-16MB operation) than current way where we inject per-vcpuasync task
to sleep, like auto-converge.

Then we have the "black box" to tune this value with below input/output:

   - Input: dirty rate information, same as current algo
- Output: increase/decrease of per-vcpu throttle_us_per_full above,and
     that's all
We can do the sampling per-second, then we keep doing it: we can have1 threaddoing per-second task collecting dirty rate information for all thevcpus, then
tune that throttle_us_per_full for each of them.

The simplest linear algorithm would be as simple as (for each vcpu):

   if (quota < current)
     throttle_us_per_full += SOMETHING;
     if (throttle_us_per_full > MAX)
       throttle_us_per_full = MAX;
   else
     throttle_us_per_full -= SOMETHING;
     if (throttle_us_per_full < 0)
       throttle_us_per_full = 0;
I think your algorithm is fine, but thoroughly review every single bitof it inone shot will be challenging, and it's also hard to prove every bit ofthealgorithm is helpful, as there're a lot of hand-made macros and statechanges.
I actually tested the current algorithm of yours, the dirty ratefluctuates abit (when I specified 200MB/s, it can go into either a few tens ofMB/s or300MB/s, normally less), neither does it respond fast (the initialthrotle from500MB/s -> 200MB/s should need 1 minute or something), so it seems notideal
anyway. In that case I prefer we start with simple.
So IMHO we can start with this simple scheme first then it'll startworkingwith much less line of codes, afaict. With that scheme ready in the1st or
initial patches, it'll be easier to either apply any better algorithm
(e.g. your current one, if you're confident with that) or other thingsthenit'll be much easier to review too if you could consider split yourpatch like
that.
Normally per my knowledge for the need on migration, we could consideradd anintegral algorithm into this linear algorithm that I said above, andit shouldhelp us reach a very stable and constant state of throttling already.But
we'll need to try it out, as I never tried.

What do you think?
I absolutely agree with your point, negative feedback system is alsowhat i thought in the first place, and theoretically may be the mostappropriate algo to control the vcpu in a stable dirty page rate from mypoint of view, but at the very beginning i'm not sure the new algo ofthrottling can be accepted, so i adopted the exiting auto-converge algoin qemu... :). One of my purposes of posting this patchset is for thesake of RFC, and thanks Peter very much for giving the advice.
I'll try it out and see the results. If things go well, the negativefeedback system to control the dirty page rate for a vcpu will beintroduced next version.

uh... "method" may be a better word to express what i mean instead of"algo" in my reply above, and the real "algo" implemented in "black box".

Re: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle

Reply via email to