This is the fourth iteration of the RFC patch set which aims to provide the basic framework for MTTCG. I hope this will provide a good base for discussion at KVM Forum later this month.
Prerequisites ============= This tree has been built on top of two other series of patches: - Reduce lock contention on TCG hot-path (v5, in Paolo's tree) - cpu-exec: Safe work in quiescent state (v5, in my tree) You can find the base tree (based off -rc0) at: https://github.com/stsquad/qemu/tree/mttcg/async-safe-work-v5 Changes ======= Since the last posting there have been a number of updates to the original patches: - more updates to docs/multi-thread-tcg.txt design document - clean ups of sleep handling (and safe work integration) - split the big enable-multi-thread patch - split some re-factoring movement stuff into individual patches As usual the patches themselves have a revision summary under the --- In addition I've brought forward a number of changes from the original ARM enabling patches to support the various cputlb operations which are basically generic anyway. These include: - making cross-vCPU tlb_flush operations use async_run_on_cpu - making tlb_reset_dirty_range atomically apply the TLB_NOTDIRTY flag A copy of the tree can be found at: https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4 The series includes all the generic work needed and in theory just needs MTTCG aware atomics and memory barriers for the various host/guest combinations to be enabled by default. In practice the memory barrier problems don't show up with an x86 host. In fact I have created a tree which merges in the Emilio's cmpxchg atomics which happily boots ARMv7 Debian systems without any additional changes. You can find that at: https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2 Testing ======= I've tested this boots ARMv7 Debian and all both ARMv7 and v8 kvm-unit-tests with: -accel tcg,thread=single In addition I've tested ARMv7 and ARMv8 kvm-unit-tests of the tcg and tlbflush group with: -accel tcg,thread=multi These tests are safe as they don't rely on atomics to be work but do exercise the parallel execution, invalidation and flushing of code. The full invocation of all the tests is: echo "Running all tests in Single Thread Mode" ./run_tests.sh -t -o "-accel tcg,thread=single -name debug-threads=on" echo "Running tlbflush in Multi Thread Mode" ./run_tests.sh -t -g tlbflush -o "-accel tcg,thread=multi -name debug-threads=on" echo "Running TCG in Multi Thread Mode" ./run_tests.sh -t -g tcg -o "-accel tcg,thread=multi -name debug-threads=on" Performance =========== You can't do full work-load testing on this tree due to the lack of atomic support (but I will run some numbers on mttcg/base-patches-v4-with-cmpxchg-atomics-v2). However you certainly see a run time improvement with the kvm-unit-tests TCG group. retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=single'] run 1: ret=0 (PASS), time=1047.147924 (1/1) run 2: ret=0 (PASS), time=1071.921204 (2/2) run 3: ret=0 (PASS), time=1048.141600 (3/3) Results summary: 0: 3 times (100.00%), avg time 1055.737 (196.70 varience/14.02 deviation) Ran command 3 times, 3 passes retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=multi'] run 1: ret=0 (PASS), time=303.074210 (1/1) run 2: ret=0 (PASS), time=304.574991 (2/2) run 3: ret=0 (PASS), time=303.327408 (3/3) Results summary: 0: 3 times (100.00%), avg time 303.659 (0.65 varience/0.80 deviation) Ran command 3 times, 3 passes The TCG tests run with -smp 4 on my system. While the TCG tests are purely CPU bound they do exercise the hot and cold paths of TCG execution (especially when triggering SMC detection). However there is still a benefit even with a 50% overhead compared to the ideal 263 second elapsed time. Alex Alex Bennée (23): cpus: make all_vcpus_paused() return bool translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH translate-all: add DEBUG_LOCKING asserts cpu-exec: include cpu_index in CPU_LOG_EXEC messages docs: new design document multi-thread-tcg.txt (DRAFTING) linux-user/elfload: ensure mmap_lock() held while setting up translate-all: Add assert_(memory|tb)_lock annotations target-arm/arm-powerctl: wake up sleeping CPUs tcg: move tcg_exec_all and helpers above thread fn tcg: cpus rm tcg_exec_all() tcg: add kick timer for single-threaded vCPU emulation tcg: rename tcg_current_cpu to tcg_current_rr_cpu cpus: re-factor out handle_icount_deadline tcg: remove global exit_request tcg: move locking for tb_invalidate_phys_page_range up cpus: tweak sleeping and safe_work rules for MTTCG tcg: enable tb_lock() for SoftMMU tcg: enable thread-per-vCPU atomic: introduce cmpxchg_bool cputlb: add assert_cpu_is_self checks cputlb: tweak qemu_ram_addr_from_host_nofail reporting cputlb: make tlb_reset_dirty safe for MTTCG cputlb: make tlb_flush_by_mmuidx safe for MTTCG Jan Kiszka (1): tcg: drop global lock during TCG code execution KONRAD Frederic (3): tcg: protect TBContext with tb_lock. tcg: add options for enabling MTTCG cputlb: introduce tlb_flush_* async work. Paolo Bonzini (1): tcg: comment on which functions have to be called with tb_lock held bsd-user/mmap.c | 5 + cpu-exec-common.c | 19 +- cpu-exec.c | 41 ++-- cpus.c | 510 +++++++++++++++++++++++++++++----------------- cputlb.c | 279 ++++++++++++++++++------- docs/multi-thread-tcg.txt | 310 ++++++++++++++++++++++++++++ exec.c | 28 +++ hw/i386/kvmvapic.c | 4 + include/exec/cputlb.h | 2 - include/exec/exec-all.h | 5 +- include/qemu/atomic.h | 9 + include/qom/cpu.h | 27 +++ include/sysemu/cpus.h | 2 + linux-user/elfload.c | 4 + linux-user/mmap.c | 5 + memory.c | 2 + qemu-options.hx | 20 ++ qom/cpu.c | 10 + softmmu_template.h | 17 ++ target-arm/Makefile.objs | 2 +- target-arm/arm-powerctl.c | 2 + target-i386/smm_helper.c | 7 + tcg/tcg.h | 2 + translate-all.c | 175 +++++++++++++--- vl.c | 48 ++++- 25 files changed, 1227 insertions(+), 308 deletions(-) create mode 100644 docs/multi-thread-tcg.txt -- 2.7.4