On 2019-01-15 21:05, Emilio G. Cota wrote: > On Tue, Jan 15, 2019 at 16:01:32 +0000, Alex Bennée wrote: >> Ahh I should have mentioned we already have the technology for this ;-) >> >> If you build the fpu/next tree on a s390x you can then run: >> >> ./tests/fp/fp-bench f64_div >> >> with and without the CONFIG_128 path. To get an idea of the real world >> impact you can compile a foreign binary and run it on a s390x system >> with: >> >> $QEMU ./tests/fp/fp-bench f64_div -t host >> >> And that will give you the peak performance assuming your program is >> doing nothing but f64_div operations. If the two QEMU's are basically in >> the same ballpark then it doesn't make enough difference. That said: > > I think you mean here `tests/fp/fp-bench -o div -p double', otherwise > you'll get the default op (-o add).
I tried that now, too, and -o div -p double does not really seem to exercise this function at all. Here are my results (disclaimer: that system is likely not really usable for benchmarks since it's CPUs are shared with other LPARs, but I ran all the tests at least twice and got similar results): With the DGLR inline assembly: time ./fp-test f64_div -l 2 -r all real 6m43,648s user 6m43,362s sys 0m0,160s time ./fp-bench -o div -p double 204.98 MFlops real 0m1,002s user 0m1,001s sys 0m0,001s With the "#else" default 64-bit code: time ./fp-test f64_div -l 2 -r all real 6m44,910s user 6m44,616s sys 0m0,165s time ./fp-bench -o div -p double 205.41 MFlops real 0m1,002s user 0m1,001s sys 0m0,001s With the new CONFIG_INT128 code: time ./fp-test f64_div -l 2 -r all real 6m58,371s user 6m58,078s sys 0m0,164s time ./fp-bench -o div -p double 205.17 MFlops real 0m1,002s user 0m1,000s sys 0m0,001s ==> The new CONFIG_INT128 code is really worse than the 64-bit code, so I don't think we should include this yet (unless we know a system where the compiler can create optimized assembly code without libgcc here). Thomas