Howard Spoelstra <hsp.c...@gmail.com> writes: > Hi, > > I built qemu-system-ppc for OSX and Windows from > https://github.com/stsquad/qemu/tree/softfloat-refactor-and-fp16-v3 > and noticed a considerable drop in floating point performance on both > hosts. > Running Mac OS 9.2 in OSX, using MacBench 3.0, the score for the > floating point performance dropped from ~60 to ~42. > > Recent tcg optimisations had improved processor and floating point > performance considerably, but that gain seems to be more than lost for > the floating point performance. > > Any idea what is causing this?
Well we expected a little degradation but it's a bit more than I expected. Re-factor 12:32:18 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1743387 msec 67108864 ops, ~25978 nsec/kop 12:32:26 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1735526 msec 67108864 ops, ~25861 nsec/kop 12:32:32 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu.git/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1742030 msec 67108864 ops, ~25958 nsec/kop Original 12:32:35 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1255007 msec 67108864 ops, ~18701 nsec/kop 12:32:44 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1243866 msec 67108864 ops, ~18535 nsec/kop 12:32:46 [alex@zen:~/m/t/aarch64] master(+0/-0) ± ~/lsrc/qemu/qemu-builddirs/arm-targets.build/aarch64-linux-user/qemu-aarch64 ./vector-benchmark -b float64-add float64-add : test took 1278100 msec 67108864 ops, ~19045 nsec/kop The main difference is all the code has a common path now and works on a canonicalized representation of the floating point number rather than direct bit-fiddling of the register representation. This is obviously going to have some cost but it looks like we should spend a bit more time seeing if we can claw some of that back. That said any floating point on QEMU is going to be slow by the very nature of the implementation. Because we don't translate guest<->host instructions directly we will be several orders of magnitude slower for floating point instructions compared to normal integer and bitwise operations. If we want to bring on FPU performance we need to look at how we can safely use host instructions to get the exact same results as we currently do with softfloat. I'm still keen on the re-factor as it does achieve the other goals of improving readability of the code and making it much easier to reason with. It also reduces the amount of effective copy and paste but with slightly different constants which the old code had. > > Best regards, > Howard -- Alex Bennée