Alex Bennée <alex.ben...@linaro.org> writes: > Hi, > > This series finally completes the re-build of Fred's multi_tcg_v8 tree > by enabling MTTCG for armv7 guests on x86 hosts. This applies on top > of the previous series: <snip> > > Benchmarks > ========== > > The benchmark is a simple boot and build test which builds stress-ng > with -j ${NR_CPUS} and shuts down to facilitate easy repetition. > > arm-softmmu/qemu-system-arm -machine type=virt -display none -m 4096 \ > -cpu cortex-a15 -serial telnet:127.0.0.1:4444 \ > -monitor stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 \ > -device virtio-net -device,netdev=unet \ > -drive > file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none > \ > -device virtio-blk-device,drive=myblock > -append "console=ttyAMA0 systemd.unit=benchmark-build.service > root=/dev/vda1" > -kernel /home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img > > > | -smp 1 (mttcg=off) | -smp 4 (mttcg=off) | -smp 4 (mttcg=on) | > |--------------------+--------------------+-------------------| > | 301.60 (5 runs) | 312.27 (4 runs) | 573.26 (5 runs) | > > As the results show currently the performance for mttcg is worse than > the single threaded version. However this tree doesn't have the > lockless tb_find_fast which means every time there is a transition > from one page to the next the lock needs to be taken. There is still > work to be done for performance ;-) > > Alex Bennée (5): > qemu-thread: add simple test-and-set spinlock > atomic: introduce atomic_dec_fetch. > atomic: introduce cmpxchg_bool > cpus: pass CPUState to run_on_cpu helpers > cpus: default MTTCG to on for 32 bit ARM on x86 > > KONRAD Frederic (5): > cpus: introduce async_safe_run_on_cpu. > cputlb: introduce tlb_flush_* async work. > translate-all: introduces tb_flush_safe. > arm: use tlb_flush_page_all for tlbimva[a] > arm: atomically check the exclusive value in a STREX > > Paolo Bonzini (1): > include: move CPU-related definitions out of qemu-common.h > > Sergey Fedorov (1): > tcg/i386: Make direct jump patching thread-safe > > cpu-exec-common.c | 1 + > cpu-exec.c | 11 ++++ > cpus.c | 137 > +++++++++++++++++++++++++++++++++++++++++----- > cputlb.c | 61 ++++++++++++++++----- > hw/i386/kvm/apic.c | 3 +- > hw/i386/kvmvapic.c | 8 +-- > hw/ppc/ppce500_spin.c | 3 +- > hw/ppc/spapr.c | 6 +- > hw/ppc/spapr_hcall.c | 12 ++-- > include/exec/exec-all.h | 7 ++- > include/qemu-common.h | 24 -------- > include/qemu/atomic.h | 15 +++++ > include/qemu/processor.h | 28 ++++++++++ > include/qemu/thread.h | 34 ++++++++++++ > include/qemu/timer.h | 1 + > include/qom/cpu.h | 34 +++++++++++- > include/sysemu/cpus.h | 13 +++++
As suggested by treblig I also ran a more pure CPU heavy task (pigz compression of a kernel tarball): command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '1', '-tcg', 'mttcg=off'] run 1: ret=0 (PASS), time=136.379699 (1/1) run 2: ret=0 (PASS), time=135.358848 (2/2) run 3: ret=0 (PASS), time=135.708094 (3/3) run 4: ret=0 (PASS), time=136.076002 (4/4) run 5: ret=0 (PASS), time=137.863306 (5/5) command is ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark-pigz.service', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-tcg', 'mttcg=on'] run 1: ret=0 (PASS), time=142.524636 (1/1) run 2: ret=0 (PASS), time=139.960601 (2/2) run 3: ret=0 (PASS), time=137.956633 (3/3) run 4: ret=0 (PASS), time=139.699225 (4/4) run 5: ret=0 (PASS), time=143.365373 (5/5) More parity but of course we'd actually want it to be faster. -- Alex Bennée