Alex Bennée <alex.ben...@linaro.org> writes: > This is the fourth iteration of the RFC patch set which aims to > provide the basic framework for MTTCG. I hope this will provide a good > base for discussion at KVM Forum later this month. > <snip> > > In practice the memory barrier problems don't show up with an x86 > host. In fact I have created a tree which merges in the Emilio's > cmpxchg atomics which happily boots ARMv7 Debian systems without any > additional changes. You can find that at: > > > https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2 > <snip> > Performance > =========== > > You can't do full work-load testing on this tree due to the lack of > atomic support (but I will run some numbers on > mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
So here is a more real world work load run: retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single'] run 1: ret=0 (PASS), time=261.794911 (1/1) run 2: ret=0 (PASS), time=257.290045 (2/2) run 3: ret=0 (PASS), time=256.536991 (3/3) run 4: ret=0 (PASS), time=254.036260 (4/4) run 5: ret=0 (PASS), time=256.539165 (5/5) Results summary: 0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation) Ran command 5 times, 5 passes retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi'] run 1: ret=0 (PASS), time=86.597459 (1/1) run 2: ret=0 (PASS), time=82.843904 (2/2) run 3: ret=0 (PASS), time=84.095910 (3/3) run 4: ret=0 (PASS), time=83.844595 (4/4) run 5: ret=0 (PASS), time=83.594768 (5/5) Results summary: 0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation) Ran command 5 times, 5 passes This shows a 30% overhead over the ideal for running multi-threaded but still seeing a decent improvement in wall time. So the test itself is booting the system, running the benchmark-build.service: # A benchmark target # # This shutsdown once the boot has completed [Unit] Description=Default Requires=basic.target After=basic.target AllowIsolate=yes [Service] Type=oneshot ExecStart=/root/mysrc/testcases.git/build-dir.sh /root/src/stress-ng.git/ ExecStartPost=/sbin/poweroff [Install] WantedBy=multi-user.target And the build-dir script is a simple: #!/bin/sh # NR_CPUS=$(grep -c ^processor /proc/cpuinfo) set -e cd $1 make clean make -j${NR_CPUS} cd - Measuring this over increasing -smp | -smp | time | time as bar | theoretical | % of -smp 1 | |------+---------+--------------+-------------+-------------| | 1 | 238.184 | WWWWWWWWWWWW | 238.184 | | | 2 | 133.402 | WWWWWWh | 119.092 | | | 3 | 99.531 | WWWWH | 79.394667 | | | 4 | 82.760 | WWWW: | 59.546 | | #+TBLFM: $3='(orgtbl-ascii-draw $2 0 238.184 12)::$4=@2$2/$1 -- Alex Bennée