Re: [PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

Hyman Thu, 08 Dec 2022 20:38:58 -0800

Ping ？

在 2022/12/4 1:09, huang...@chinatelecom.cn 写道:

From: Hyman Huang(黄勇) <huang...@chinatelecom.cn>

v3(resend):
- fix the syntax error of the topic.

v3:
This version make some modifications inspired by Peter and Markus
as following:
1. Do the code clean up in [PATCH v2 02/11] suggested by Markus
2. Replace the [PATCH v2 03/11] with a much simpler patch posted by
    Peter to fix the following bug:
    https://bugzilla.redhat.com/show_bug.cgi?id=2124756
3. Fix the error path of migrate_params_check in [PATCH v2 04/11]
    pointed out by Markus. Enrich the commit message to explain why
    x-vcpu-dirty-limit-period an unstable parameter.
4. Refactor the dirty-limit convergence algo in [PATCH v2 07/11]
    suggested by Peter:
    a. apply blk_mig_bulk_active check before enable dirty-limit
    b. drop the unhelpful check function before enable dirty-limit
    c. change the migration_cancel logic, just cancel dirty-limit
       only if dirty-limit capability turned on.
    d. abstract a code clean commit [PATCH v3 07/10] to adjust
       the check order before enable auto-converge
5. Change the name of observing indexes during dirty-limit live
    migration to make them more easy-understanding. Use the
    maximum throttle time of vpus as "dirty-limit-throttle-time-per-full"
6. Fix some grammatical and spelling errors pointed out by Markus
    and enrich the document about the dirty-limit live migration
    observing indexes "dirty-limit-ring-full-time"
    and "dirty-limit-throttle-time-per-full"
7. Change the default value of x-vcpu-dirty-limit-period to 1000ms,
    which is optimal value pointed out in cover letter in that
    testing environment.
8. Drop the 2 guestperf test commits [PATCH v2 10/11],
    [PATCH v2 11/11] and post them with a standalone series in the
    future.

Thanks Peter and Markus sincerely for the passionate, efficient
and careful comments and suggestions.

Please review.

Yong

v2:
This version make a little bit modifications comparing with
version 1 as following:
1. fix the overflow issue reported by Peter Maydell
2. add parameter check for hmp "set_vcpu_dirty_limit" command
3. fix the racing issue between dirty ring reaper thread and
    Qemu main thread.
4. add migrate parameter check for x-vcpu-dirty-limit-period
    and vcpu-dirty-limit.
5. add the logic to forbid hmp/qmp commands set_vcpu_dirty_limit,
    cancel_vcpu_dirty_limit during dirty-limit live migration when
    implement dirty-limit convergence algo.
6. add capability check to ensure auto-converge and dirty-limit
    are mutually exclusive.
7. pre-check if kvm dirty ring size is configured before setting
    dirty-limit migrate parameter

A more comprehensive test was done comparing with version 1.

The following are test environment:
-------------------------------------------------------------
a. Host hardware info:

CPU:
Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz

CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    2

NUMA node0 CPU(s):               0-15,32-47
NUMA node1 CPU(s):               16-31,48-63

Memory:
Hynix  503Gi

Interface:
Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
Speed: 1000Mb/s

b. Host software info:

OS: ctyunos release 2
Kernel: 4.19.90-2102.2.0.0066.ctl2.x86_64
Libvirt baseline version:  libvirt-6.9.0
Qemu baseline version: qemu-5.0

c. vm scale
CPU: 4
Memory: 4G
-------------------------------------------------------------

All the supplementary test data shown as follows are basing on
above test environment.

In version 1, we post a test data from unixbench as follows:

$ taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |---------------------+--------+------------+---------------|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |---------------------+--------+------------+---------------|
   | dhry2reg            | 32800  | 32786      | 25292         |
   | whetstone-double    | 10326  | 10315      | 9847          |
   | pipe                | 15442  | 15271      | 14506         |
   | context1            | 7260   | 6235       | 4514          |
   | spawn               | 3663   | 3317       | 3249          |
   | syscall             | 4669   | 4667       | 3841          |
   |---------------------+--------+------------+---------------|

In version 2, we post a supplementary test data that do not use
taskset and make the scenario more general, see as follows:

$ ./Run

per-vcpu data:
   |---------------------+--------+------------+---------------|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |---------------------+--------+------------+---------------|
   | dhry2reg            | 2991   | 2902       | 1722          |
   | whetstone-double    | 1018   | 1006       | 627           |
   | Execl Throughput    | 955    | 320        | 660           |
   | File Copy - 1       | 2362   | 805        | 1325          |
   | File Copy - 2       | 1500   | 1406       | 643           |
   | File Copy - 3       | 4778   | 2160       | 1047          |
   | Pipe Throughput     | 1181   | 1170       | 842           |
   | Context Switching   | 192    | 224        | 198           |
   | Process Creation    | 490    | 145        | 95            |
   | Shell Scripts - 1   | 1284   | 565        | 610           |
   | Shell Scripts - 2   | 2368   | 900        | 1040          |
   | System Call Overhead| 983    | 948        | 698           |
   | Index Score         | 1263   | 815        | 600           |
   |---------------------+--------+------------+---------------|
Note:
   File Copy - 1: File Copy 1024 bufsize 2000 maxblocks
   File Copy - 2: File Copy 256 bufsize 500 maxblocks
   File Copy - 3: File Copy 4096 bufsize 8000 maxblocks
   Shell Scripts - 1: Shell Scripts (1 concurrent)
   Shell Scripts - 2: Shell Scripts (8 concurrent)

Basing on above data, we can draw a conclusion that dirty-limit
can hugely improve the system benchmark almost in every respect,
the "System Benchmarks Index Score" show it improve 35% performance
comparing with auto-converge during live migration.

4-vcpu parallel data(we run a test vm with 4c4g-scale):
   |---------------------+--------+------------+---------------|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |---------------------+--------+------------+---------------|
   | dhry2reg            | 7975   | 7146       | 5071          |
   | whetstone-double    | 3982   | 3561       | 2124          |
   | Execl Throughput    | 1882   | 1205       | 768           |
   | File Copy - 1       | 1061   | 865        | 498           |
   | File Copy - 2       | 676    | 491        | 519           |
   | File Copy - 3       | 2260   | 923        | 1329          |
   | Pipe Throughput     | 3026   | 3009       | 1616          |
   | Context Switching   | 1219   | 1093       | 695           |
   | Process Creation    | 947    | 307        | 446           |
   | Shell Scripts - 1   | 2469   | 977        | 989           |
   | Shell Scripts - 2   | 2667   | 1275       | 984           |
   | System Call Overhead| 1592   | 1459       | 692           |
   | Index Score         | 1976   | 1294       | 997           |
   |---------------------+--------+------------+---------------|

For the parallel data, the "System Benchmarks Index Score" show it
also improve 29% performance.

In version 1, migration total time is shown as follows:

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |-----------------------+----------------+-------------------|
   | dirty memory size(MB) | Dirtylimit(ms) | Auto-converge(ms) |
   |-----------------------+----------------+-------------------|
   | 60                    | 2014           | 2131              |
   | 70                    | 5381           | 12590             |
   | 90                    | 6037           | 33545             |
   | 110                   | 7660           | [*]               |
   |-----------------------+----------------+-------------------|
   [*]: This case means migration is not convergent.

In version 2, we post more comprehensive migration total time test data
as follows:

we update N MB on 4 cpus and sleep S us every time 1 MB data was updated.
test twice in each condition, data is shown as follow:

   |-----------+--------+--------+----------------+-------------------|
   | ring size | N (MB) | S (us) | Dirtylimit(ms) | Auto-converge(ms) |
   |-----------+--------+--------+----------------+-------------------|
   | 1024      | 1024   | 1000   | 44951          | 191780            |
   | 1024      | 1024   | 1000   | 44546          | 185341            |
   | 1024      | 1024   | 500    | 46505          | 203545            |
   | 1024      | 1024   | 500    | 45469          | 909945            |
   | 1024      | 1024   | 0      | 61858          | [*]               |
   | 1024      | 1024   | 0      | 57922          | [*]               |
   | 1024      | 2048   | 0      | 91982          | [*]               |
   | 1024      | 2048   | 0      | 90388          | [*]               |
   | 2048      | 128    | 10000  | 14511          | 25971             |
   | 2048      | 128    | 10000  | 13472          | 26294             |
   | 2048      | 1024   | 10000  | 44244          | 26294             |
   | 2048      | 1024   | 10000  | 45099          | 157701            |
   | 2048      | 1024   | 500    | 51105          | [*]               |
   | 2048      | 1024   | 500    | 49648          | [*]               |
   | 2048      | 1024   | 0      | 229031         | [*]               |
   | 2048      | 1024   | 0      | 154282         | [*]               |
   |-----------+--------+--------+----------------+-------------------|
   [*]: This case means migration is not convergent.

Not that the larger ring size is, the less sensitively dirty-limit responds,
so we should choose a optimal ring size base on the test data with different
scale vm.

We also test the effect of "x-vcpu-dirty-limit-period" parameter on
migration total time. test twice in each condition, data is shown
as follows:

   |-----------+--------+--------+-------------+----------------------|
   | ring size | N (MB) | S (us) | Period (ms) | migration total time |
   |-----------+--------+--------+-------------+----------------------|
   | 2048      | 1024   | 10000  | 100         | [*]                  |
   | 2048      | 1024   | 10000  | 100         | [*]                  |
   | 2048      | 1024   | 10000  | 300         | 156795               |
   | 2048      | 1024   | 10000  | 300         | 118179               |
   | 2048      | 1024   | 10000  | 500         | 44244                |
   | 2048      | 1024   | 10000  | 500         | 45099                |
   | 2048      | 1024   | 10000  | 700         | 41871                |
   | 2048      | 1024   | 10000  | 700         | 42582                |
   | 2048      | 1024   | 10000  | 1000        | 41430                |
   | 2048      | 1024   | 10000  | 1000        | 40383                |
   | 2048      | 1024   | 10000  | 1500        | 42030                |
   | 2048      | 1024   | 10000  | 1500        | 42598                |
   | 2048      | 1024   | 10000  | 2000        | 41694                |
   | 2048      | 1024   | 10000  | 2000        | 42403                |
   | 2048      | 1024   | 10000  | 3000        | 43538                |
   | 2048      | 1024   | 10000  | 3000        | 43010                |
   |-----------+--------+--------+-------------+----------------------|

It shows that x-vcpu-dirty-limit-period should be configured with 1000 ms
in above condition.

Please review, any comments and suggestions are very appreciated, thanks

Yong

Hyman Huang (9):
   dirtylimit: Fix overflow when computing MB
   softmmu/dirtylimit: Add parameter check for hmp "set_vcpu_dirty_limit"
   qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
   qapi/migration: Introduce vcpu-dirty-limit parameters
   migration: Introduce dirty-limit capability
   migration: Refactor auto-converge capability logic
   migration: Implement dirty-limit convergence algo
   migration: Export dirty-limit time info for observation
   tests: Add migration dirty-limit capability test

Peter Xu (1):
   kvm: dirty-ring: Fix race with vcpu creation

  accel/kvm/kvm-all.c          |   9 +++
  include/sysemu/dirtylimit.h  |   2 +
  migration/migration.c        |  87 ++++++++++++++++++++++++
  migration/migration.h        |   1 +
  migration/ram.c              |  63 ++++++++++++++----
  migration/trace-events       |   1 +
  monitor/hmp-cmds.c           |  26 ++++++++
  qapi/migration.json          |  65 +++++++++++++++---
  softmmu/dirtylimit.c         |  91 ++++++++++++++++++++++---
  tests/qtest/migration-test.c | 154 +++++++++++++++++++++++++++++++++++++++++++
  10 files changed, 467 insertions(+), 32 deletions(-)

Re: [PATCH RESEND v3 00/10] migration: introduce dirtylimit capability

Reply via email to